nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 85 forks source link

remote AntiSMASH fails with duplicate CDS features #876

Open alexweisberg opened 1 year ago

alexweisberg commented 1 year ago

Are you using the latest release? Version 1.8.14

Describe the bug Running the tutorial for genome+RNAseq with our own data, we successfully get through most of the annotation stages.

However when we run the remote antiSMASH portion it runs for a while and then we get a timeout/no route to host error.

When I manually check the status of the antiSMASH job (fungi-73669e74-7826-4ae1-bc9a-223d2a47caed) on the antiSMASH website, it lists the following error:

Submitted: Mar 7, 2023 13:48:22 Status: failed: Job returned errors: ERROR 08/03 08:07:00 Multiple CDS features have the same location: 615475:616117 Last status change: Mar 8, 2023 00:07:00

What command did you issue?

previously:

funannotate train -i ../RepeatMasker/Poreg_genome_final.fasta.masked -o Poreg_fun \ --left ../Trim_Galore/Poreg_RNA_1_val_1.fq.gz ../Trim_Galore/Poreg_RNA_2_val_1.fq.gz \ --right ../Trim_Galore/Poreg_RNA_1_val_2.fq.gz ../Trim_Galore/Poreg_RNA_2_val_2.fq.gz \ --stranded RF --jaccard_clip --species "Pseudozyma oregonense" \ --strain UnNamed --cpus 12 --no_trimmomatic

funannotate predict -i ../RepeatMasker/Poreg_genome_final.fasta.masked -o Poreg_fun \ --species "Pseudozyma oregonense" --strain UnNamed \ --cpus 12 --protein_evidence ../Related_species/Phubeiensis_GCF_000403515.1_ASM40351v1_protein.faa $FUNANNOTATE_DB/uniprot_sprot.fasta

funannotate update -i Poreg_fun --cpus 12

command with error: funannotate remote -i Poreg_fun -m antismash -e ouremailaddress@university.edu

Logfiles sge.funantismash.e134764.log.txt

OS/Install Information

I masked out some of our cluster specific paths.


Checking dependencies for 1.8.14

You are running Python v 3.8.15. Now checking python packages... biopython: 1.80 goatools: 1.2.3 matplotlib: 3.4.3 natsort: 8.2.0 numpy: 1.24.1 pandas: 1.5.2 psutil: 5.9.4 requests: 2.28.1 scikit-learn: 1.2.0 scipy: 1.10.0 seaborn: 0.12.2 All 11 python packages installed

You are running Perl v b'5.032001'. Now checking perl modules... Carp: 1.38 Clone: 0.46 DBD::SQLite: 1.72 DBD::mysql: 4.050 DBI: 1.643 DB_File: 1.855 Data::Dumper: 2.183 File::Basename: 2.85 File::Which: 1.24 Getopt::Long: 2.54 Hash::Merge: 0.302 JSON: 4.10 LWP::UserAgent: 6.67 Logger::Simple: 2.0 POSIX: 1.94 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.14 Tie::File: 1.06 URI::Escape: 5.12 YAML: 1.30 local::lib: 2.000029 threads: 2.25 threads::shared: 1.61 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/LOCATION/databases/funannotate/current $PASAHOME=/LOCATION/funannotate-1.8.14/opt/pasa-2.5.2 $TRINITY_HOME=/LOCATION/funannotate-1.8.14/opt/trinity-2.8.5 $EVM_HOME=/LOCATION/funannotate-1.8.14/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/LOCATION/Funannotate/augustus/config $GENEMARK_PATH=/LOCATION/gmes_linux_64 All 6 environmental variables are set

Checking external dependencies... PASA: 2.5.2 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.5.0 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v35 diamond: 2.0.15 emapper.py: 2.1.9 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: 36.3.8g glimmerhmm: 3.0.4 gmap: 2023-02-17 gmes_petap.pl: 4.71_lic hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 17.0.3-internal kallisto: 0.46.1 mafft: v7.508 (2022/Sep/07) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.24-r1122 pigz: 2.7 proteinortho: 6.1.7 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.16.1 signalp: 4.1 snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.11 (Oct 2022) tantan: tantan 40 tbl2asn: 25.8 tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 All 37 external dependencies are installed

caioald commented 1 year ago

I got the same problem here.

Running funannotate remote for antismash, I get the following error:

ERROR 22/05 11:48:33 Multiple CDS features have the same location: [144149:144395](-)

I am afraid that because of some alternative splicing it identifies multiple mRNA features but from them, there is only one (same) CDS predicted in the same location.

     gene            complement(144150..144941)
                     /locus_tag="FUN_020060"
     mRNA            complement(join(144150..144784,144846..144941))
                     /locus_tag="FUN_020060"
                     /product="hypothetical protein"
     mRNA            complement(join(144150..144447,144510..144941))
                     /locus_tag="FUN_020060"
                     /product="hypothetical protein"
     CDS             complement(144150..144395)
                     /locus_tag="FUN_020060"
                     /codon_start=1
                     /product="hypothetical protein"
                     /protein_id="ncbi:FUN_020060-T1"
                     /translation="MRSTAYMHNSQCFSTFPSFHVRIPLSCPSPKDLSAFCDSCPCLV
                     SLGYSSISRLGCVITESGDLISSNNGRDMSSPILNQP"
     CDS             complement(144150..144395)
                     /locus_tag="FUN_020060"
                     /codon_start=1
                     /product="hypothetical protein"
                     /protein_id="ncbi:FUN_020060-T2"
                     /translation="MRSTAYMHNSQCFSTFPSFHVRIPLSCPSPKDLSAFCDSCPCLV
                     SLGYSSISRLGCVITESGDLISSNNGRDMSSPILNQP"

Would there be any way to overcome this?

If I remove all the identical CDS, would the resulting output be parseable in the annotation?

Thanks in advance for the help.