nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

Funannotate update: Feature overlapped by 2 identical-length genes but has no cross-reference #962

Open metalichen opened 9 months ago

metalichen commented 9 months ago

Are you using the latest release? I'm using the latest version available in docker (v1.8.15) Describe the bug After I ran funannotate update, I got a message about several gene models that need fixing:

FUN_010869      Feature overlapped by 2 identical-length genes but has no cross-reference
FUN_010870      Feature overlapped by 2 identical-length genes but has no cross-reference

When I checked the tbl file, I indeed saw that genes FUN_010869 and FUN_010870 overlap in length, but have different CDSs:

780550  781621  gene
            locus_tag   FUN_010869
780550  781239  mRNA
781297  781621
            product hypothetical protein
            transcript_id   gnl|ncbi|FUN_010869-T1_mrna
            protein_id  gnl|ncbi|FUN_010869-T1
780738  781239  CDS
781297  781484
            codon_start 1
            product hypothetical protein
            transcript_id   gnl|ncbi|FUN_010869-T1_mrna
            protein_id  gnl|ncbi|FUN_010869-T1
780550  781243  mRNA
781297  781621
            product hypothetical protein
            transcript_id   gnl|ncbi|FUN_010869-T2_mrna
            protein_id  gnl|ncbi|FUN_010869-T2
780738  781243  CDS
781297  781363
            codon_start 1
            product hypothetical protein
            transcript_id   gnl|ncbi|FUN_010869-T2_mrna
            protein_id  gnl|ncbi|FUN_010869-T2
780550  781621  gene
            locus_tag   FUN_010870
780550  781621  mRNA
            product hypothetical protein
            transcript_id   gnl|ncbi|FUN_010870-T1_mrna
            protein_id  gnl|ncbi|FUN_010870-T1
780738  781247  CDS
            codon_start 1
            product hypothetical protein
            transcript_id   gnl|ncbi|FUN_010870-T1_mrna
            protein_id  gnl|ncbi|FUN_010870-T1

How should I fix the file? Should I move mRNA and CDS features from FUN_010870 to FUN_010869, and remove FUN_010870? Would that work? Thank you!

What command did you issue? singularity run ../singularity/funannotate.sif funannotate update -i analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred/ --cpus 28

Logfiles

[Sep 15 09:30 PM]: Funannotate update is finished, output files are in the analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred//update_results folder
[Sep 15 09:30 PM]: There are 5 gene models that need to be fixed.
[Sep 15 09:30 PM]: Manually edit the tbl file analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred/update_results/Xanthoria_parietina_46-1-SA22.tbl, then run:

funannotate fix -i analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred/update_results/Xanthoria_parietina_46-1-SA22.gbk -t analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_p$

[Sep 15 09:30 PM]: After the problematic gene models are fixed, you can proceed with functional annotation.
[Sep 15 09:30 PM]: Your next step might be functional annotation, suggested commands:
-------------------------------------------------------
Run InterProScan (Docker required):
funannotate iprscan -i analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred/ -m docker -c 28

Run antiSMASH:
funannotate remote -i analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred/ -m antismash -e youremail@server.edu

Annotate Genome:
funannotate annotate -i analysis_and_temp_files/06_annotate_lecanoro/Xp_jgi_pred/ --cpus 28 --sbt yourSBTfile.txt
-------------------------------------------------------

-------------------------------------------------------
-------------------------------------------------------
FUN_000415      Feature begins or ends in gap starting at 1124202
FUN_002683      Feature begins or ends in gap starting at 552774
FUN_010869      Feature overlapped by 2 identical-length genes but has no cross-reference
FUN_010870      Feature overlapped by 2 identical-length genes but has no cross-reference
-------------------------------------------------------

OS/Install Information

You are running Perl v b'5.026002'. Now checking perl modules... Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.855 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 local::lib: 2.000024 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/opt/databases $PASAHOME=/venv/opt/pasa-2.4.1 $TRINITYHOME=/venv/opt/trinity-2.8.5 $EVM_HOME=/venv/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/usr/share/augustus/config ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir

Checking external dependencies... ERROR: pslDnaFiler found but error running: pslCDnaFilter: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory

PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.2 bamtools: bamtools 2.5.2 bedtools: bedtools v2.30.0 blat: BLAT v35 diamond: 2.1.6 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: 36.3.8g glimmerhmm: 3.0.4 gmap: 2017-11-15 hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.8-internal kallisto: 0.46.1 mafft: v7.520 (2023/Mar/22) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.24-r1122 pigz: 2.6 proteinortho: 6.0.16 salmon: salmon 0.14.1 samtools: samtools 1.12 snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 40 tbl2asn: 25.8 tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: emapper.py not installed ERROR: gmes_petap.pl not installed
ERROR: pslCDnaFilter not installed
ERROR: signalp not installed

ruthpg commented 7 months ago

how did you solve this issue? @nextgenusfs I ran into the same issue, and in addition I was getting quite some other errors, some look like:

FUN_029468      Feature overlapped by 2 identical-length genes but has no cross-reference
FUN_029469      Feature overlapped by 2 identical-length genes but has no cross-reference
FUN_029502      Feature begins or ends in gap starting at 1627056
FUN_029528      CDS not contained within cross-referenced mRNA

Then if I look into the tbl file, these two genes indeed have identical length and do not cross reference each other, how should this be fixed?:

686279  688092  gene
            locus_tag   FUN_029468
686279  686935  mRNA
687021  688092
            product hypothetical protein
            transcript_id   gnl|ncbi|FUN_029468-T1_mrna
            protein_id  gnl|ncbi|FUN_029468-T1
686582  686935  CDS
687021  687803
            codon_start 1
            product hypothetical protein
            transcript_id   gnl|ncbi|FUN_029468-T1_mrna
            protein_id  gnl|ncbi|FUN_029468-T1
686279  686935  mRNA
687021  687376
687540  688092
            product hypothetical protein
            transcript_id   gnl|ncbi|FUN_029468-T2_mrna
            protein_id  gnl|ncbi|FUN_029468-T2
686308  686935  CDS
687021  687376
687540  687803
            codon_start 1
            product hypothetical protein
            transcript_id   gnl|ncbi|FUN_029468-T2_mrna
            protein_id  gnl|ncbi|FUN_029468-T2
686279  687376  mRNA
687540  688092
            product hypothetical protein
            transcript_id   gnl|ncbi|FUN_029468-T3_mrna
            protein_id  gnl|ncbi|FUN_029468-T3
686308  687376  CDS
687540  687688
            codon_start 1
            product hypothetical protein
            transcript_id   gnl|ncbi|FUN_029468-T3_mrna
            protein_id  gnl|ncbi|FUN_029468-T3
686279  688092  gene
            locus_tag   FUN_029469
686279  688092  mRNA
            product hypothetical protein
            transcript_id   gnl|ncbi|FUN_029469-T1_mrna
            protein_id  gnl|ncbi|FUN_029469-T1
686308  687426  CDS
            codon_start 1
            product hypothetical protein
            transcript_id   gnl|ncbi|FUN_029469-T1_mrna
            protein_id  gnl|ncbi|FUN_029469-T1

for this gene it complains that start or ends in a gap starting at 1627056:

1627057 1625448 gene
            locus_tag   FUN_029502
1627057 1626623 mRNA
1626464 1626390
1626323 1626248
1626122 1625448
            product hypothetical protein
            transcript_id   gnl|ncbi|FUN_029502-T1_mrna
            protein_id  gnl|ncbi|FUN_029502-T1
1626732 1626623 CDS
1626464 1626390
1626323 1626248
1626124 1625822
            codon_start 1
            product hypothetical protein
            transcript_id   gnl|ncbi|FUN_029502-T1_mrna
            protein_id  gnl|ncbi|FUN_029502-T1

And for this one it says "CDS not contained within cross-referenced mRNA", but I cannot really see what is wrong with it...

2522815 2523725 gene
            locus_tag   FUN_029528
2522815 2523725 mRNA
            product hypothetical protein
            transcript_id   gnl|ncbi|FUN_029528-T1_mrna
            protein_id  gnl|ncbi|FUN_029528-T1
2522901 2523377 CDS
            codon_start 1
            product hypothetical protein
            transcript_id   gnl|ncbi|FUN_029528-T1_mrna
            protein_id  gnl|ncbi|FUN_029528-T1

Emiliania_huxleyi_CCMP1516.models-need-fixing.txt Emiliania_huxleyi_CCMP1516.zip I'd appreciate any help to be able to finish this annotation :)