When using the modisco.h5 to track down original seqlet coordinates, the example_idx parameter is arbitrary (i.e. doesn't have a 1:1 match with the dimensions of the original input .npy in terms of different contribution score windows). This makes it essentially impossible to use only the "modisco.h5" saved metadata to trace back the genomic coordinates of the seqlets. You can reimplement parts of the "extract_seqlets.py" code to find which of your pos/neg_regions made your cut, but would it be possible to make the contribution indices match the modisco.h5 outputs?
When using the
modisco.h5
to track down original seqlet coordinates, theexample_idx
parameter is arbitrary (i.e. doesn't have a 1:1 match with the dimensions of the original input .npy in terms of different contribution score windows). This makes it essentially impossible to use only the "modisco.h5" saved metadata to trace back the genomic coordinates of the seqlets. You can reimplement parts of the "extract_seqlets.py" code to find which of your pos/neg_regions made your cut, but would it be possible to make the contribution indices match the modisco.h5 outputs?Specifically, this function I think is where we lose the information. https://github.com/jmschrei/tfmodisco-lite/blob/main/modiscolite/extract_seqlets.py#L59