wyp1125 / MCScanX

MCScanX: Multiple Collinearity Scan toolkit X version. The most popular synteny analysis tool in the world!
http://chibba.pgml.uga.edu/mcscan2/
219 stars 60 forks source link

Collinear block coordinates #19

Open cascoamarillo opened 5 years ago

cascoamarillo commented 5 years ago

Hi, I am not sure if I am missing something, but could it be possible to extract the entire block coordinates? So instead of the genes in that block, we have a start (1st gene start position) and stop (last gene final position) coordinates in that corresponding contig/scaffold.

Ideally, if we have the collinearity file:

`##########################################

Alignment 0: score=8871.0 e_value=0 N=188 at1&at1 plus

0- 0: AT1G17240 AT1G72300 0 0- 1: AT1G17290 AT1G72330 0 0- 2: AT1G17310 AT1G72350 5e-41 0- 3: AT1G17350 AT1G72420 2e-113 0- 4: AT1G17380 AT1G72450 7e-63 0- 5: AT1G17400 AT1G72490 2e-82 0- 6: AT1G17420 AT1G72520 0 ...... 0-183: AT1G22270 AT1G78190 6e-45 0-184: AT1G22280 AT1G78200 2e-107 0-185: AT1G22300 AT1G78220 1e-72 0-186: AT1G22330 AT1G78260 1e-63 0-187: AT1G22340 AT1G78270 3e-174

Alignment 1: score=2935.0 e_value=2.7e-245 N=64 at1&at1 plus

1- 0: AT1G10640 AT1G60590 0 ...`

If we can convert it to an output file like .gff or .bed with the block start-stop positions, and including both columns. Something like:

at1-block0(1st column) start (1st gene coordinates AT1G17240) stop(last gene AT1G22340 final pos.) at1-block0(2nd column) start (1st gene coordinates AT1G72300) stop(last gene AT1G78270)

Extracting the first coordinate (coming from the 1st gene) for each block (and column) seems easy but I find it hard to implement the last gene into it.

Thank you