hoffmangroup / genomedata

The Genomedata format for storing large-scale functional genomics data.
https://genomedata.hoffmanlab.org/
GNU General Public License v2.0
2 stars 1 forks source link

Recent genomedata versions break previous (version 1.3.6) masking #51

Open EricR86 opened 5 years ago

EricR86 commented 5 years ago

Original report (archived issue) by Coby Viner (Bitbucket: cviner2, GitHub: cviner).

The original report had attachments: mm9_chrY-only_MASK-cov.bedGraph.gz


The following error occurs on the latest version (1.4.4), but works fine in version 1.3.6. It seems like this is due to the new masking functionality, added in 1.4.0.

#!text

>> <cytomod.py> 2019-01-02T17:18:55.693228 Genomedata archive
                                           successfully loaded.
>> <cytomod.py> 2019-01-02T17:18:55.697772 Masking is enabled. All
                                           loci implicated by the mask
                                           will be masked
                                           irrespective of any mods at
                                           those loci.
>> <cytomod.py> 2019-01-02T17:18:55.697999 The order of preference for
                                           base
                                           modifications (from highest
                                           to lowest) is:
                                           f,h,m,c,w,x,y,z.
>> <cytomod.py> 2019-01-02T17:18:55.700827 Outputting the modified
                                           genome for: chrY
>> <cytomod.py> 2019-01-02T17:18:55.706329 Now outputting chrY for
                                           region: (0, 2000000)
Traceback (most recent call last):
  File "../../src/cytomod.py", line 859, in <module>
    args.maskAllUnsetRegions)
  File "../../src/cytomod.py", line 313, in generateFASTAFile
    maskAllUnsetRegions) + "\n")
  File "../../src/cytomod.py", line 147, in getModifiedGenome
    maskTrack = chromosome[s:e, maskRegionTName]
  File "/home/cviner2/.local/lib/python2.7/site-packages/genomedata/__init__.py", line 755, in __getitem__
    track_key)
TypeError: Unrecognized track indexing method: mm9_chrY-only_MASK-cov.bedGraph.gz

It would be nice if backward compatibility could be restored and preserved.

Please find the offending BEDGraph file (mm9_chrY-only_MASK-cov.bedGraph.gz) enclosed.

EricR86 commented 5 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


Could you clarify your indexing method here? I see you specify a range from s to e and use a track indexing method with a bedgraph.gz file? Or is it just the file name where the trackname matches this particular file?

EricR86 commented 5 years ago

Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).


EricR86 commented 5 years ago

Original comment by Coby Viner (Bitbucket: cviner2, GitHub: cviner).


Thanks for looking into these! I'm not sure I follow, probably since I've not looked at this in some time and no longer recall some genomedata syntax and terms.

Perhaps more context on my invocation of genomedata here will be useful:

This is occurring when the enclosed track is provided as a "mask track", as that term was used in version 1.3.6 and prior. This is eventually done from Cytomod via its getModifiedGenome function, which in this case detects the use of a mask track and tries to use it via:

#!python

maskTrack = chromosome[s:e, maskRegionTName]
maskIndex = genome.tracknames_continuous.index(maskRegionTName)

Accordingly, I think it is indeed the latter—this file is being used as a mask track and that mask is itself derived from this BEDGraph alone.

I should also note that the indexing is indeed as you've described it, via the specified range. The mask track is used to determine the positions that are masked, as described in Cytomod's genomedata archive creation parameter and in its masking parameter, both of which were used in this case.

Sorry if that doesn't help. Happy to chat about this tomorrow, if that's easier.

EricR86 commented 5 years ago

Original comment by Coby Viner (Bitbucket: cviner2, GitHub: cviner).