jzhoulab / puffin

deep learning-inspired explainable sequence model for transcription initiation
https://puffin.zhoulab.io
Other
87 stars 6 forks source link

The Reason of Offset Handling for Negative Strand Coordinates #14

Closed yangzhao1230 closed 1 month ago

yangzhao1230 commented 3 months ago

There is an offset added when handling negative strand coordinates:

if strand == "-":
    offset = 1
    strand_ = "minus"
else:
    offset = 0
    strand_ = "plus"

From my understanding, the coordinates on the positive and negative strands should be equivalent, with the only difference being the orientation. In other projects, like BEND, they simply reverse complement the sequence without adding an offset for the negative strand.

My background in biology is limited, so I would greatly appreciate any insights or explanations on this matter. Understanding the rationale behind this implementation would help in ensuring the accurate use of genomic data in my analysis.

junhong-huang commented 1 month ago

Hello, I'm a student in SYSU, I see you have some doubts about puffin software, I just recently also in study the software, you can communicate with me in details by email: 1051945022@qq.com

jzthree commented 1 month ago

This is due to how the transcription start site is annotated in our files. Offset is needed to make sure that the transcription start site is at the same position in the sequence no matter whether it is on "+" or "-" strand.

junhong-huang commented 1 month ago

您好,感谢您的来信,我会尽快查看祝身体健康,生活愉快    黄钧鸿华南师范大学生命科学学院school of life science ,SCNU&