nselem / corason

Bioinformatic Tools for study Evolution of metabolic diversity
GNU Affero General Public License v3.0
28 stars 12 forks source link

CORASON: empty SVG #6

Closed cmandreani closed 2 years ago

cmandreani commented 2 years ago

Ubuntu 20.04 (WSL)

Hi, I've installed BiG-SCAPE and CORASON successfully.

When running BiG-SCAPE it appears to be no problem as I can visualize with no troubles the index.html file.

However, when running:

~/bin/run_corason query.fasta gbks_dir/ gbks_dir/ref_BGC.gbk -g

the output folder is generated with several written files such as 72 "JobID.input", 8 "ClusterN", "Concatenados.faa", "TempConcatendados.faa", "Frecuency", "query.fasta.BLAST", "query.fasta.parser", "query.fasta_PrincipalHits", "query.fasta_Report" and others, plus 3 folders (CORASON, GBK, MINI). But "query.fasta_tree.svg" is blank (unable to open with firefox and weights 0 KB).

I noticed that the "Corason_Rast.IDS" file generated didn´t collapsed BGCs into their genomes correctly (there are 1911 BGCs from 192 genomes, but their genbank files are named i.e.: genomeY_ctgZ_region001); for which i replaced the second column with the "genomeY" annotations, keeping the same architecture than the original (JobID/GenomeID/OrganismName) and ran:

~/bin/run_corason query.fasta gbks_dir/ gbks_dir/ref_BGC.gbk -g --rast_ids CorasonRast_indexMod.IDS

The following result presented no change despite the correction of the Corason_Rast.IDS file:

########################################################## Welcome to CORASON

CORASON-BGC

CORe Analysis of Syntenic Orthologs Natural Product-Biosynthetic Gene Cluster

########################################################## Your current directory is /home/output, local path output

You will use antiSMASH file none I must check query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta is a fasta file Your cluster is located on organism number 100559 Default e-value=1E-15 Your bitscore is set to 0, you can use a positive bitscore to reduce your hunchs the radio of your cluster is 10 Minimal e-value to be consider an homologous of a cluster member is: 1E-3 Minimal e-value for ortho groups in core 1E-3 You are rescaling gene size by a factor: 85000 Your rast ids file is Corason_Rast.IDs All genomes would be procesed You will explore 1912 genomes /opt/corason/CORASON/CoreCluster.pl -q query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta -s 100559 -e_value 1E-15 -b 0 -cluster_radio 10 -e_core 1E-3 -e_cluster 1E-3 -rescale 85000 -l 100001:101912 -num 1912 -rast_ids Corason_Rast.IDs -antismash none Searching sequences from query (/opt/corason/CORASON/1_Context_text.pl) /opt/corason/CORASON/1_Context_text.pl -q query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta -s 100559 -e_value 1E-15 -b 0 -cluster_radio 10 -e_cluster 1E-3 -r 85000 -l 100001:101912 -n 1912 -rast_ids Corason_Rast.IDs -type prots -makedb -antismash none -dir_scripts /opt/corason/CORASON

dir_scripts /opt/corason/CORASON mkdir: cannot create directory '/home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output': File exists I will search homologous genes in organisms I will create a Database with selected genomes prots type

Aminoacid data will be analized /opt/corason/CORASON/header.pl GENOMES Corason_Rast.IDs /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output pause before makeDB makeblastdb -in /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/TempConcatenados.faa -dbtype prot -out /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/temDatabase.dbProtein db was created Looking for hits homologous gene search finished Searching for homologous gene in clusters 8 gen were found surrounding query, cluster radio can not exceed this radio. I have colored genes according to homology Now I will produce the *.input file Use of uninitialized value $hit in string eq at /opt/corason/CORASON/1_Context_text.pl line 272. ## x8

Sequences search finished

Analizing cluster with hits according to the query sequence

Can't exec "1": No such file or directory at /opt/corason/CORASON/ReadingInputs.pl line 17.

--------------------- WARNING ---- MSG: Can not set Bio::Location::Simple::end() equal to start; start not set ## x7

There are 1 similar clusters Creating query hits tree, without considering the core-clusters

Aligning Sequences

Shaving alignments with Gblocks

  • File not opened *

File not in NBRF/PIR format or too few sequences in the alignment: 0

Execution terminated Couldn't open /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/PrincipalHits.muscle-gb file No such file or directory at /opt/corason/CORASON/RenamePrincipalHits.pl line 11. FastTree Version 2.1.11 SSE3 Alignment: /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/RightNamesPrincipalHits.txt Amino acid distances: BLOSUM45 Joins: balanced Support: SH-like 1000 Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1 TopHits: 1.00*sqrtN close=default refresh=0.80 ML Model: Jones-Taylor-Thorton, CAT approximation with 20 rate categories Cannot read /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/RightNamesPrincipalHits.txt Searching genetic core on selected clusters /opt/corason/CORASON/2_OrthoGroups.pl -e_core 1E-3 -list 100559_6 -num 1 -rast_ids Corason_Rast.IDs -outname /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output -dir_scripts /opt/corason/CORASON I will run allvsall with blast /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/Coreoutput.blast You want ortho groups of the following genomes 100559_6 Now finding Best Bidirectional Hits List Selecting List that contains orthologs from all desired genomes Starting Star groups num 1 list 100559_6 Starting stars /opt/corason/CORASON/SearchAminoacidsFromCore.pl 100559_6 /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output Done! /opt/corason/CORASON/ReadReaction 100559_6 1 /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output rm: cannot remove '/home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/CORASON/FUNCTION': No such file or directory Core finished!

There is a core with at least two genes on this cluster Best cluster /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/100559_6 Best cluster /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output cut -f1,2 /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/CORASON/FUNCTION/100559_6.core.function

Aligning...

/home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/CORASON

&align 1,1,/home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/CORASON,100559_6 Couldnt open /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/CORASON/ALIGNMENTS_GB/1.muscle.pir No such file or directory Sequences were aligned

Creating aminoacid core cluster matrix.. directory /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/CORASON/ALIGNMENTS_GB will be open Se abrio el directorio /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/CORASON/ALIGNMENTS_GB con los archivos numericos

Can't open /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/CORASON/CONCATENADOS/: No such file or directory. couldn't open /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/CORASON/CONCATENADOS/1 No such file or directory at /opt/corason/CORASON/Concatenador.pl line 86. /opt/corason/CORASON/Rename_Ids_Star_Tree.pl Corason_Rast.IDs /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output rm: cannot remove '/home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/RightNames.txt': No such file or directory Couldn't open /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/SalidaConcatenada.txt file No such file or directory at /opt/corason/CORASON/Rename_Ids_Star_Tree.pl line 11. Can't open /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/RightNames.txt: No such file or directory. Formating matrix.. FastTree Version 2.1.11 SSE3 Alignment: /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/RightNames.txt Amino acid distances: BLOSUM45 Joins: balanced Support: SH-like 1000 Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1 TopHits: 1.00sqrtN close=default refresh=0.80 ML Model: Jones-Taylor-Thorton, CAT approximation with 20 rate categories Cannot read /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/RightNames.txt nw_topology -b -IL /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta_BGC.tre | nw_display -b 'opacity:0' -v 40 -s - >/home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta_tree.svgI will draw SVG clusters with concatenated tree order /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output outname

Sequences didnt align, I will draw SVG clusters with the single hits blast order grep: /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output//home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output.BLAST: No such file or directory Use of uninitialized value $INPUTS in scalar chop at /opt/corason/CORASON/CoreCluster.pl line 281. Use of uninitialized value $INPUTS in concatenation (.) or string at /opt/corason/CORASON/CoreCluster.pl line 230. Use of uninitialized value $INPUTS in concatenation (.) or string at /opt/corason/CORASON/CoreCluster.pl line 231. Use of uninitialized value $INPUTS in concatenation (.) or string at /opt/corason/CORASON/CoreCluster.pl line 232.

Draw Now SVG file will be generated with inputs:

/opt/corason/CORASON/3_Draw.pl 85000 /home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fastacouldn open frequency file query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta/Frequency Not a directory at /opt/corason/CORASON/3_Draw.pl line 113. SVG file generated

mv: cannot stat '/home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/Contextos.svg': No such file or directory Cleaning temporary files rm: cannot remove '/home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/PrincipalHits.muscle-gb.htm': No such file or directory rm: cannot remove '/home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/*.txt': No such file or directory rm: cannot remove '/home/output/query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta-output/CORASON_GENOMES': No such file or directory mv query_Transport_PF07689.16_MajorFacilitatorSF_36.20.20.fasta /home/output/Done Have a nice day

I've selected several BGCs among the dataset, with or without matches in MiBIG, varying the query genes from core to adjacent and belonging to the reference BGC or from any other, but with no success.

I tried also lowering down the cutoff to 1E-5 and extending the cluster radio up to 70 with no results.

Any ideas where may I be failing?

Thanks.

nselem commented 2 years ago

Please could you send me a sample with ~10 genomes to see if I can reproduce the error. Can you show me the header of some input file? Also, could you tell me if the sample that worked ok for CORASON?

cmandreani commented 2 years ago

Hi, thanks for answering.

I am not using genomes, as it is explained here that:

CORASON finds variation in the genomic vicinity of a reference cluster. To this end, CORASON can explore either BGCs predicted by antiSMASH or complete genomes. Results of this approaches will be slightly different.

And in the Streptomyces example, running is shown as:

~/bin/run_corason TauD.fasta gbks gbks/JMGX01000001.1.cluster003.gbk -g

~/bin/run_corason TauD.fasta genomes genomes/JOBW01.gbk -g

I'll send to you the input files (queryGene.faa and refBgc.gbk) and the 10 genomes vía email.

Cheers.

nselem commented 2 years ago

Just for the sake of other users: This issue is similar to this one: https://github.com/nselem/evomining/issues/3 The problem is the use of muscle in ubuntu for windows. And corason stand-alone is better for exploring genomes while searching for BGCs, or BGCs variants that may have not been annotated by antiSMASH. If you already have your Genbank files of your desired BGCs there is no need to use CORASON standalone tool because big-scape already have a corason version integrated that allows visualizing the BGCs phylogenetic trees , and it is nod needed for exploring because bigscape already

cmandreani commented 2 years ago

I've noticed that CORASON's third issue is also solved with this answer.

Cheers and thanks again.