rhysf / Synima

Synteny Imager
MIT License
60 stars 8 forks source link

Error on running the OrthoMCL script #43

Open dkisakye opened 9 months ago

dkisakye commented 9 months ago

Hello @rhysf I am trying to run synima on two genomes and other input files downloaded from ncbi (H99-GCA_000149245.3 and JEC21 -GCA_000091045.1), however, I run into this error at the step of running the orthoMCL script. "retrieve_blast_pairs: Printing to OMCL_outdir/all_blast_pairs.m8 Saving ID->Genome from ./Repo_spec.txt.all.GFF3... Assigning genome codes... H99 => G001 JEC21 => G002 Writing Gcoded version of blast.m8... Error, no genome for AFR93007.2 saved from all_annotations.gff3 (check settings and rerun): Inappropriate ioctl for device". Can you advise on how to handle it? Many thanks in advance!

rhysf commented 9 months ago

Hi. I think the issue is probably relating to the format of your gff3 files being properly understood by synima. This remains an issue i'm not certain how best to address. However, i would re-run the first step, and pay extra attention to the warnings. If that doesn't solve the issue, please paste the outputs and commands from doing so, and any follow up steps, and i'll try and figure out what further issues might be present.

dkisakye commented 9 months ago

Hi @rhysf. This is the output from the first step. Downloaded genomes and all input files from Fungidb today. I just realised I was using files from ncbi previously, so I decided to get them from another repo just to see if the error recurs. perl ../util/Create_full_repo_sequence_databases.pl -r ./Repo_spec.txt combine_all_gff3_files_in_repo: printing to ./Repo_spec.txt.all.GFF3 combine_all_gff3_files_in_repo: opening ./H99/FungiDB-66_CneoformansH99.annotation.gff3 combine_all_gff3_files_in_repo: removing ID= from IDs for H99... combine_all_gff3_files_in_repo: saving IDs (e.g. CNAG_00001-t26_1) for H99... combine_all_gff3_files_in_repo: found 7826 mRNA features in ./H99/FungiDB-66_CneoformansH99.annotation.gff3 combine_all_gff3_files_in_repo: opening ./JEC21/FungiDB-66_CneoformansJEC21.annotation.gff3 combine_all_gff3_files_in_repo: removing ID= from IDs for JEC21... combine_all_gff3_files_in_repo: saving IDs (e.g. CNA00010-t26_1) for JEC21... combine_all_gff3_files_in_repo: found 6862 mRNA features in ./JEC21/FungiDB-66_CneoformansJEC21.annotation.gff3 Indexing H99... Indexing JEC21... Creating repository sequence databases... Copying ./H99/FungiDB-66_CneoformansH99.annotation.pep to ./Repo_spec.txt.all.PEP for H99... Copying ./H99/FungiDB-66_CneoformansH99.annotation.cds to ./Repo_spec.txt.all.CDS for H99... Copying ./JEC21/FungiDB-66_CneoformansJEC21.annotation.pep to ./Repo_spec.txt.all.PEP for JEC21... Copying ./JEC21/FungiDB-66_CneoformansJEC21.annotation.cds to ./Repo_spec.txt.all.CDS for JEC21... Checking FASTA and GFF repository sequence databases match... fasta_to_struct: saving from ./Repo_spec.txt.all.PEP... gff_to_contig_parent_to_cds_hash: saving all from ./Repo_spec.txt.all.GFF3...

0 / 14688 found (GFF ./Repo_spec.txt.all.GFF3 entries in FASTA ./Repo_spec.txt.all.PEP) 0 / 14688 found (FASTA ./Repo_spec.txt.all.PEP entries in GFF ./Repo_spec.txt.all.GFF3) WARNING: ./Repo_spec.txt.all.PEP and ./Repo_spec.txt.all.GFF3 repository sequence databases are not correctly formatted. Change settings and re-run, or rename ID's in FASTA or GFF to match. Use -v for further info. fasta_to_struct: saving from ./Repo_spec.txt.all.CDS... gff_to_contig_parent_to_cds_hash: saving all from ./Repo_spec.txt.all.GFF3...

14688 / 14688 found (GFF ./Repo_spec.txt.all.GFF3 entries in FASTA ./Repo_spec.txt.all.CDS) 14688 / 14688 found (FASTA ./Repo_spec.txt.all.CDS entries in GFF ./Repo_spec.txt.all.GFF3) ./Repo_spec.txt.all.CDS and ./Repo_spec.txt.all.GFF3 repository sequence databases are correctly formatted. ../util/Create_full_repo_sequence_databases.pl: finished check.

dkisakye commented 9 months ago

@rhysf Second step: perl ../util/Blast_grid_all_vs_all.pl -r ./Repo_spec.txt fasta_to_struct: saving from ./Repo_spec.txt.all.PEP... split_fasta_seq_dictionary_by_species: split ./Repo_spec.txt.all.PEP split_fasta_seq_dictionary_by_species: splitting JEC21 -> ./JEC21/FungiDB-66_CneoformansJEC21.annotation.pep.synima-parsed.PEP... split_fasta_seq_dictionary_by_species: 6862 printed for JEC21 split_fasta_seq_dictionary_by_species: splitting H99 -> ./H99/FungiDB-66_CneoformansH99.annotation.pep.synima-parsed.PEP... split_fasta_seq_dictionary_by_species: 7826 printed for H99 Using legacy BLAST... CMD: formatdb -i ./JEC21/FungiDB-66_CneoformansJEC21.annotation.pep.synima-parsed.PEP -p T CMD: formatdb -i ./H99/FungiDB-66_CneoformansH99.annotation.pep.synima-parsed.PEP -p T blastall -p blastp -i ./JEC21/FungiDB-66_CneoformansJEC21.annotation.pep.synima-parsed.PEP -d ./JEC21/FungiDB-66_CneoformansJEC21.annotation.pep.synima-parsed.PEP -m 8 -v 1000 -b 1000 -e 1e-20 > ./JEC21/RBH_blast_PEP/vs_JEC21.blast.m8 blastall -p blastp -i ./JEC21/FungiDB-66_CneoformansJEC21.annotation.pep.synima-parsed.PEP -d ./H99/FungiDB-66_CneoformansH99.annotation.pep.synima-parsed.PEP -m 8 -v 5 -b 5 -e 1e-20 > ./JEC21/RBH_blast_PEP/vs_H99.blast.m8 blastall -p blastp -i ./H99/FungiDB-66_CneoformansH99.annotation.pep.synima-parsed.PEP -d ./JEC21/FungiDB-66_CneoformansJEC21.annotation.pep.synima-parsed.PEP -m 8 -v 5 -b 5 -e 1e-20 > ./H99/RBH_blast_PEP/vs_JEC21.blast.m8 blastall -p blastp -i ./H99/FungiDB-66_CneoformansH99.annotation.pep.synima-parsed.PEP -d ./H99/FungiDB-66_CneoformansH99.annotation.pep.synima-parsed.PEP -m 8 -v 1000 -b 1000 -e 1e-20 > ./H99/RBH_blast_PEP/vs_H99.blast.m8

Third Step: perl ../util/Blast_all_vs_all_repo_to_OrthoMCL.pl -r ./Repo_spec.txt retrieve_blast_pairs: Printing to OMCL_outdir/all_blast_pairs.m8 Saving ID->Genome from ./Repo_spec.txt.all.GFF3... Assigning genome codes... H99 => G001 JEC21 => G002 Writing Gcoded version of blast.m8... Error, no genome for CNAG_03614-t26_1-p1 saved from all_annotations.gff3 (check settings and rerun): Inappropriate ioctl for device

rhysf commented 9 months ago

Yes, so here is the important message on step 1:

0 / 14688 found (GFF ./Repo_spec.txt.all.GFF3 entries in FASTA ./Repo_spec.txt.all.PEP) 0 / 14688 found (FASTA ./Repo_spec.txt.all.PEP entries in GFF ./Repo_spec.txt.all.GFF3) WARNING: ./Repo_spec.txt.all.PEP and ./Repo_spec.txt.all.GFF3 repository sequence databases are not correctly formatted. Change settings and re-run, or rename ID's in FASTA or GFF to match. Use -v for further info.

You will need to follow those instructions, or rename the id's in those files to make sure they match. The ID's it is expecting in both files are:

combine_all_gff3_files_in_repo: saving IDs (e.g. CNAG_00001-t26_1) for H99... combine_all_gff3_files_in_repo: saving IDs (e.g. CNA00010-t26_1) for JEC21...

So, it looks like you might try and remove the -t26_1 for example, if that is unique to either the GFF or FASTA files.