open2c / bioframe

Genomic interval operations on Pandas DataFrames
MIT License
173 stars 28 forks source link

make chromarms - broken, or changed? #175

Closed Phlya closed 11 months ago

Phlya commented 11 months ago

This code from our open2c_examples notebooks doesn't work:

# Use bioframe to fetch the genomic features from the UCSC.
hg38_chromsizes = bioframe.fetch_chromsizes('hg38')
hg38_cens = bioframe.fetch_centromeres('hg38')
# create a view with chromosome arms using chromosome sizes and definition of centromeres
hg38_arms = bioframe.make_chromarms(hg38_chromsizes,  hg38_cens)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tungstenfs/scratch/ggiorget/ilya/Projects/open2c_examples/contacts_vs_distance.ipynb Cell 8 line 5
      hg38_cens = bioframe.fetch_centromeres('hg38')
      # create a view with chromosome arms using chromosome sizes and definition of centromeres
----> hg38_arms = bioframe.make_chromarms(hg38_chromsizes,  hg38_cens)
      # select only those chromosomes available in cooler
      hg38_arms = hg38_arms[hg38_arms.chrom.isin(clr.chromnames)].reset_index(drop=True)

File /tungstenfs/scratch/ggiorget/ilya/condaenvs/open2c/lib/python3.9/site-packages/bioframe/extras.py:72, in make_chromarms(chromsizes, midpoints, cols_chroms, cols_mids, suffixes)
     69     raise ValueError(\"unknown input type for chromsizes\")
     71 if len(cols_chroms) == 2:
---> 72     _verify_columns(df_chroms, [ck1, sk1])
     73     columns_to_drop += [sk1]
     74     df_chroms[\"end\"] = df_chroms[sk1].values

File /tungstenfs/scratch/ggiorget/ilya/condaenvs/open2c/lib/python3.9/site-packages/bioframe/core/specs.py:89, in _verify_columns(df, colnames, unique_cols, return_as_bool)
     87     if return_as_bool:
     88         return False
---> 89     raise ValueError(
     90         \", \".join(set(colnames).difference(set(df.columns)))
     91         + \" not in keys of df.columns\"
     92     )
     93 if return_as_bool:
     94     return True

ValueError: chrom not in keys of df.columns
Phlya commented 11 months ago

@nvictus issue is the 'local' provider different Series than 'ucsc': Screenshot 2023-11-07 at 16 44 12 For some reason this highlighted "name" index name breaks make_chromarms... at least using the 'ucsc' provider fixes the problem.

nvictus commented 11 months ago

The input behavior for this function are kind of inconsistent. A series should be treated like a dictionary (index names should be ignored) and dicts should be accepted as input for both chromsizes and midpoints.

Meant to create a PR in the vscode web UI, but it ended up pushing to main by accident: https://github.com/open2c/bioframe/commit/3d2f347edee3bb8a517a01b6fac65b05d86eb07a

Please review and make sure it works for you.

Phlya commented 11 months ago

Thank you, this fixed the problem!