mikolmogorov / maf2synteny

A tool for recovering synteny blocks from multiple alignment
Other
29 stars 7 forks source link

clarification of blocks_coords.txt #4

Open 0xaf1f opened 2 years ago

0xaf1f commented 2 years ago

My first issue is that blocks_coords.txt is not machine readable, so I've written a script to convert it to BED format for some downstream analyses (it would be great if maf2synteny just output the blocks coordinates in this format directly, by the way). I'm trying to make sure I understand this correctly so that I get the coordinates right.

Block 489
Seq_id  Strand  Start   End     Length
41      -       853742  845462  8280
42      +       3564788 3573068 8280
57      -       844024  835744  8280
71      +       3562990 3571270 8280

Am I correct in that Start is always a zero-based number and End is one-based, even for negative strand entries? BED is that way, but for negative strand entries, the Start position is always the smaller number. So for converting this to BED, I would take the coordinates for the positive strand records as is, but for the minus strand ones, I'd need to add 1 to Start, subtract 1 from End and then switch them? Is that right?

mikolmogorov commented 2 years ago

The format is similar to Sibelia output, here is more detailed description: https://github.com/bioinf/Sibelia/blob/master/SIBELIA.md#blocks-coordinates. Both coordinates should be 1-based.

0xaf1f commented 2 years ago

Hmm. I think there's something wrong then. If the coordinates are both 1-based, then the length should be abs(Start - End) + 1. In my example above, the length is the exact difference without having to add 1 ( 844024 - 835744 = 8280). In Sibelia example output, it's the way I'd expect given their description of the format:

1   -   595992  590919  5074

595992 - 590919 + 1 = 5074

0xaf1f commented 2 years ago

what looks to actually be the case currently in maf2synteny's blocks_coords is that the smaller of the two numbers (Start for + strand blocks, Stop for - strand blocks) is 0-based and the other is 1-based.

But consider this a feature request for having this file created in bed format :pray: