mdshw5 / pyfaidx

Efficient pythonic random access to fasta subsequences
https://pypi.python.org/pypi/pyfaidx
Other
459 stars 75 forks source link

Fasta to Bed #133

Closed atfields closed 6 years ago

atfields commented 6 years ago

Hi Matt, I recently saw a post of your's (https://www.biostars.org/p/191052/) and thought I would try it out. It worked like a charm, but the bed file output started at 1 instead of zero. I am new to bed files so I am unsure if a sequence of 1000 bases should be listed as 0 1000 or 0 999, but from what I have seen, they all start at 0. Thanks for your help. Regards, Andrew

mdshw5 commented 6 years ago

Thanks so much for reporting this! It was quite a silly mistake since BED coordinate are definitely [0, 1) and not (1, 1) as I had implemented this feature.

Your example is almost correct - a sequence of 1000 bases starting at 1 would be [1, 1000] in 1-based open coordinates and [0, 1000) in BED half-open coordinates. That is, the end coordinate is excluded in BED format. See http://genome.ucsc.edu/blog/the-ucsc-genome-browser-coordinate-counting-systems/

atfields commented 6 years ago

Hi Matt,

Thanks so much for the fix and the insight!

Regards, Andrew

On Wed, Jan 24, 2018 at 8:12 PM, Matt Shirley notifications@github.com wrote:

Thanks so much for reporting this! It was quite a silly mistake since BED coordinate are definitely [0, 1) and not (1, 1) as I had implemented this feature.

Your example is almost correct - a sequence of 1000 bases starting at 1 would be [1, 1000] in 1-based open coordinates and [0, 1000) in BED half-open coordinates. That is, the end coordinate is excluded in BED format. See http://genome.ucsc.edu/blog/the-ucsc-genome-browser- coordinate-counting-systems/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mdshw5/pyfaidx/issues/133#issuecomment-360338531, or mute the thread https://github.com/notifications/unsubscribe-auth/ASf2MJqb6dbjgsF2M3I6Yja8anCDT-wiks5tN-LzgaJpZM4RsHLT .