jmschrei / tfmodisco-lite

A lite implementation of tfmodisco, a motif discovery algorithm for genomics experiments.
MIT License
56 stars 16 forks source link

`example_idx` doesnt trace back to original contribution scores #39

Open mlweilert opened 1 year ago

mlweilert commented 1 year ago

When using the modisco.h5 to track down original seqlet coordinates, the example_idx parameter is arbitrary (i.e. doesn't have a 1:1 match with the dimensions of the original input .npy in terms of different contribution score windows). This makes it essentially impossible to use only the "modisco.h5" saved metadata to trace back the genomic coordinates of the seqlets. You can reimplement parts of the "extract_seqlets.py" code to find which of your pos/neg_regions made your cut, but would it be possible to make the contribution indices match the modisco.h5 outputs?

Specifically, this function I think is where we lose the information. https://github.com/jmschrei/tfmodisco-lite/blob/main/modiscolite/extract_seqlets.py#L59

jmschrei commented 1 year ago

Hm, that does seem like a problem. I've asked @ivyraine to look into it.