Open 0xaf1f opened 2 years ago
The format is similar to Sibelia output, here is more detailed description: https://github.com/bioinf/Sibelia/blob/master/SIBELIA.md#blocks-coordinates. Both coordinates should be 1-based.
Hmm. I think there's something wrong then. If the coordinates are both 1-based, then the length should be abs(Start - End) + 1. In my example above, the length is the exact difference without having to add 1 ( 844024 - 835744 = 8280). In Sibelia example output, it's the way I'd expect given their description of the format:
1 - 595992 590919 5074
595992 - 590919 + 1 = 5074
what looks to actually be the case currently in maf2synteny's blocks_coords is that the smaller of the two numbers (Start for + strand blocks, Stop for - strand blocks) is 0-based and the other is 1-based.
But consider this a feature request for having this file created in bed format :pray:
My first issue is that
blocks_coords.txt
is not machine readable, so I've written a script to convert it to BED format for some downstream analyses (it would be great if maf2synteny just output the blocks coordinates in this format directly, by the way). I'm trying to make sure I understand this correctly so that I get the coordinates right.Am I correct in that
Start
is always a zero-based number andEnd
is one-based, even for negative strand entries? BED is that way, but for negative strand entries, theStart
position is always the smaller number. So for converting this to BED, I would take the coordinates for the positive strand records as is, but for the minus strand ones, I'd need to add 1 toStart
, subtract 1 fromEnd
and then switch them? Is that right?