wyp1125 / MCScanX

MCScanX: Multiple Collinearity Scan toolkit X version. The most popular synteny analysis tool in the world!
http://chibba.pgml.uga.edu/mcscan2/
211 stars 59 forks source link

No result after run commands #53

Open pcampiteli opened 2 years ago

pcampiteli commented 2 years ago

Greetings, I'm trying to make use of MCScanX_h, i've prepared the necessaries files following the manual yet my data neither example data is working.

My gff with 5 species gff is edited follwing the CH# gene start end Ta1 TA20_000001 40390 41754 ...

my .homology file achieved by running OrthoFinder, and extracting the pair-wise data as follows for each species withou the third optional collumn TH179_000002 TH3844_011373 ...

Reading other issues on git, the solution of tab delimited files and moving them to the program folder doesn't resolved it. Also the example data returns the same no output.

"using example data" /home/h.paulocampiteli/MCScanX-master/MCScanX /home/h.paulocampiteli/MCScanX-master/data/ Reading BLAST file and pre-processing Generating BLAST list 0 matches imported (0 discarded) 0 pairwise comparisons 0 alignments generated Pairwise collinear blocks written to /home/h.paulocampiteli/MCScanX-master/data/.collinearity [0.001 seconds elapsed] Writing multiple syntenic blocks to HTML files Done! [0.000 seconds elapsed]

"using my own on another folder" /home/h.paulocampiteli/MCScanX-master/MCScanX_h /storage4/h.paulocampiteli/synteny/mscscan_analysis/ Reading homologs and pre-processing Generating homolog list 0 homologous pairs imported (0 discarded) 0 pairwise comparisons 0 alignments generated Pairwise collinear blocks written to /storage4/h.paulocampiteli/synteny/mscscan_analysis/.collinearity [0.001 seconds elapsed] Writing multiple syntenic blocks to HTML files Print statistics: Species # of collinear homolog pairs # of homolog pairs Percentage

"using my data on the MCscan folder /home/h.paulocampiteli/MCScanX-master/MCScanX_h /home/h.paulocampiteli/MCScanX-master/MCScanX Reading homologs and pre-processing Generating homolog list 0 homologous pairs imported (0 discarded) 0 pairwise comparisons 0 alignments generated Pairwise collinear blocks written to /home/h.paulocampiteli/MCScanX-master/MCScanX.collinearity [0.001 seconds elapsed] Writing multiple syntenic blocks to HTML files Print statistics: Species # of collinear homolog pairs # of homolog pairs Percentage Done! [0.001 seconds elapsed]

I could not find any other response regarding this problem. Anyone knows what sorcery I must make to put the program to work?

Thanks in advance

Botantisty commented 2 years ago

Hey, Did you ever come up with a solution to this issue? I am encountering the same problem both with the supplied test data and my data. ~Best

thesnakeguy commented 1 year ago

Same for me, it's not working. I am using the right .gff input data (according to other users since there is conflicting info in the man pages here) -> chr gene start stop. Anyone got this software running?

pcampiteli commented 1 year ago

Sorry guys, I could not fix the problem and gave up MCScanX.

But I'm using Synima (Synteny Imager) which makes the Synteny analysis using three software options (orthofinder, OrthoMCL, RBH) and plots good quality Synteny graphs

😊

Em ter., 10 de jan. de 2023 7:56 AM, thesnakeguy @.***> escreveu:

Same for me, it's not working. I am using the right .gff input data (according to other users since there is conflicting info in the man pages here) -> chr gene start stop. Anyone got this software running?

— Reply to this email directly, view it on GitHub https://github.com/wyp1125/MCScanX/issues/53#issuecomment-1377075557, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZLZ65WEOL47YZNFF3Z5AH3WRU56VANCNFSM5XFMBXXA . You are receiving this because you authored the thread.Message ID: @.***>

thesnakeguy commented 1 year ago

Many thanks for your reply! I just got it running, the chromosome names of both species needed correct formatting in the gff and there were still some spaces instead of tabs. Now I just tried to visualize things with SynVisio, but I get nothing.. although my collinarity file is definitely not empty... Thanks for you suggestion and best wishes!

AnezkaKar commented 10 months ago

Hi, thanks for the tip. I have the same problem, this software is not working neither with the example data provided here in the "data" folder nor with my data. I'll try out the other tool then.

kimlu1998 commented 4 weeks ago

Sorry guys, I could not fix the problem and gave up MCScanX. But I'm using Synima (Synteny Imager) which makes the Synteny analysis using three software options (orthofinder, OrthoMCL, RBH) and plots good quality Synteny graphs 😊 Em ter., 10 de jan. de 2023 7:56 AM, thesnakeguy @.> escreveu: … Same for me, it's not working. I am using the right .gff input data (according to other users since there is conflicting info in the man pages here) -> chr gene start stop. Anyone got this software running? — Reply to this email directly, view it on GitHub <#53 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZLZ65WEOL47YZNFF3Z5AH3WRU56VANCNFSM5XFMBXXA . You are receiving this because you authored the thread.Message ID: @.> For Synima, can I use it to detect tandem duplicates? My main goal was to use this to identify tandem duplicates and possibly classify them by gene family. Thank you.

kimlu1998 commented 4 weeks ago

For Synima, can I use it to detect tandem duplicates? My main goal was to use this to identify tandem duplicates and possibly classify them by gene family. Thank you

cdizzel commented 1 week ago

Hi all, I was able to get this to work by doing the following:

Homology File

Protein fasta files were cleaned to remove all information other than gene names, and special characters removed. ">Species1||Cb2||7383158||7410177||CQ013704-RA||-1||CDS||3396420774||25012||frame0" became ">CQ013704"

That was accomplished using the following code and some manual tidy up. This will differ depending on your files structure.

#!/bin/bash

input_file="Species1-prot.fasta"
output_file="Species1-prot-reformat.fasta"

awk '
    BEGIN { FS="\\|\\|" } 
    /^>/ { 
        split($0, a, "\\|\\|")
        print ">" a[5] 
    } 
    !/^>/ { print }' $input_file > $output_file

A homology search was performed using blastp blastp -query species1.protein.fa -subject species2.protein.fa -outfmt 6 -evalue 1e-10 -max_hsps 5 -max_target_seqs 5 -out aa_bb.blast

Convert the blast to a .homology using awk '{print $1, $2, $12}' aa_bb.blast > aa_bb.homology

Find and replace spaces with tabs within aa_bb.homology which results in a file that looks like:

C0000001 CQ025429 689 C0000001 CQ055736 602 C0000002 CQ025428 71.2 C0000003 CQ025424 613 C0000003 CQ055734 575 C0000003 CQ052761 192

BED file

I converted my input gff files with agat_convert_sp_gff2bed.pl.

agat_convert_sp_gff2bed.pl --gff species1.gff3 -o aa.gff agat_convert_sp_gff2bed.pl --gff species2.gff3 -o bb.gff cat aa.gff bb.gff > aa_bb.gff

The columns were shifted to the correct order because as other have suggested. The BED (labeled .gff) file formatting needs to be: chr gene start stop

cf1 C0000001 9845 13412 cf1 C0000002 25196 35998 cf1 C0000003 61576 65469 cf1 C0000004 97774 99106 ... qa1 QA053298 6234676 6237979 qa1 QA053299 6297794 6299368 qa1 QA053300 6346001 6350418 qa1 QA053301 6350608 6357388

I made sure the chr names were two letter + number, and all lowercase although I'm not sure that changed anything. Both the .homology and .gff files were placed into their own directory, in this case "homology" with nothing else in it. I was able to run MCScanX_h outside of its data directory.

cd homology

From within the homology directory MCScanX_h was called using the following /home/bioinformatics/tools/MCScanX/MCScanX_h aa_bb

Hope this helps someone.