jmschrei / tfmodisco-lite

A lite implementation of tfmodisco, a motif discovery algorithm for genomics experiments.
MIT License
56 stars 16 forks source link

Reset seqlet coordinates with respect to "--window" #20

Open jvierstra opened 1 year ago

jvierstra commented 1 year ago

The seqlet coordinates (per example) are returned relative to the "trimmed" contributions For example, if the user specificies a window of 100 with the "--window" argument, on a full sequence input width of 100 then the return seqlet coordinates, then the seqlet coordinates should be shifted by 450 bp when saving to the file MODISCO output file.

Alternatively, this could be just documented. I was trying to use the H5 file (after backwards conversion) to call seqlet hits on the examples and this messed me up for a couple of hours.

snaqvi1990 commented 1 year ago

Could you clarify how you get a 450 bp shift when setting --window to 100? Doesn't it mean that there is a 100bp window centered on the peak center, and then all seqlet coordinates per example are with respect to that 100bp window? I am also trying to call seqlet hits and having issues. Thanks