wyp1125 / MCScanX

MCScanX: Multiple Collinearity Scan toolkit X version. The most popular synteny analysis tool in the world!
http://chibba.pgml.uga.edu/mcscan2/
218 stars 60 forks source link

Cluster enabled version of MCScanX #32

Closed sanyalab closed 2 years ago

sanyalab commented 3 years ago

Hello Dr. Yupeng Wang,

I am using the MCScanX algorithm for computing synteny among 48 genomes. I have a total of 1.36 million protein sequences and a 4.3G blast file. Since this takes a long time, I was wondering if there is a cluster or MPI enabled version of the tool to speed up the process.

Thanks Abhijit

sanyalab commented 2 years ago

Here is a solution for those interested, when a large number of genomes are to be run via MCScanX.

Do the following

  1. Breakup the blast file on a per genome basis (Gene ids belonging to a particular genome should appear in col 1 OR col 2 OR both cols of the blast result file)
  2. Keep the concatenated gff file untouched (This contains the gene positions of all the genes from all genomes)
  3. Run MCScanX with the per_genome_blast_file and the gff_file
  4. Clean up the folder genome.html/ to retain html files related to the particular genome only.
  5. Concatenate all the collinearity files.

NOTE: The summary at the beginning of each collinearity file is on a per genome basis.

Thanks

P.S. Is this not an active project anymore? The author seems to be not responsive. Just curious.