nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
321 stars 85 forks source link

argparse.ArgumentError while running annotate #539

Closed athulmenon closed 3 years ago

athulmenon commented 3 years ago

Hi,

I am running in the latest docker version of funannotate. I have ran annotate module somedays back without any error. I have pulled a latest docker version after one of your fix. Now when I try to run the annotate module the below error comes. I tried with test module the same error persists. Can you please let me know how to fix.

`./funannotate-docker test -t annotate --cpus 6

######################################################### Running funannotate annotate unit testing Downloading: https://osf.io/97pyn/download?version=1 Bytes: 341476 CMD: funannotate annotate --genbank Genome_one.gbk -o annotate --cpus 6 --iprscan genome_one.iprscan.xml --eggnog genome_one.emapper.annotations ######################################################### Traceback (most recent call last): File "/venv/bin/funannotate", line 713, in main() File "/venv/bin/funannotate", line 703, in main mod.main(arguments) File "/venv/lib/python3.7/site-packages/funannotate/annotate.py", line 316, in main help='Annotated if genome not masked and skip bad contigs') File "/venv/lib/python3.7/argparse.py", line 1373, in add_argument return self._add_action(action) File "/venv/lib/python3.7/argparse.py", line 1736, in _add_action self._optionals._add_action(action) File "/venv/lib/python3.7/argparse.py", line 1577, in _add_action action = super(_ArgumentGroup, self)._add_action(action) File "/venv/lib/python3.7/argparse.py", line 1387, in _add_action self._check_conflict(action) File "/venv/lib/python3.7/argparse.py", line 1526, in _check_conflict conflict_handler(action, confl_optionals) File "/venv/lib/python3.7/argparse.py", line 1535, in _handle_conflict_error raise ArgumentError(action, message % conflict_string) argparse.ArgumentError: argument --force: conflicting option string: --force ######################################################### ERROR: funannotate annotate test failed - check logfiles ######################################################### `

OS

Checking dependencies for 1.8.4

You are running Python v 3.7.9. Now checking python packages... biopython: 1.78 goatools: 1.0.15 matplotlib: 3.3.3 natsort: 7.1.0 numpy: 1.19.5 pandas: 1.2.1 psutil: 5.8.0 requests: 2.25.1 scikit-learn: 0.24.1 scipy: 1.5.3 seaborn: 0.11.1 All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules... Bio::Perl: 1.007002 Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.855 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/opt/databases $PASAHOME=/venv/opt/pasa-2.4.1 $TRINITYHOME=/venv/opt/trinity-2.8.5 $EVM_HOME=/venv/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/venv/config ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir

Checking external dependencies... Traceback (most recent call last): File "/venv/bin/ete3", line 6, in from ete3.tools.ete import main File "/venv/lib/python3.7/site-packages/ete3/tools/ete.py", line 55, in from . import (ete_split, ete_expand, ete_annotate, ete_ncbiquery, ete_view, File "/venv/lib/python3.7/site-packages/ete3/tools/ete_view.py", line 48, in from .. import (Tree, PhyloTree, TextFace, RectFace, faces, TreeStyle, CircleFace, AttrFace, ImportError: cannot import name 'TextFace' from 'ete3' (/venv/lib/python3.7/site-packages/ete3/init.py) PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.3 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v36 diamond: 2.0.6 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2017-11-15 hisat2: 2.2.1 hmmscan: HMMER 3.3.1 (Jul 2020) hmmsearch: HMMER 3.3.1 (Jul 2020) java: 11.0.8-internal kallisto: 0.46.1 mafft: v7.475 (2020/Nov/23) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.17-r941 proteinortho: 6.0.16 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.10 snap: 2006-07-28 stringtie: 2.1.4 tRNAscan-SE: 2.0.7 (Oct 2020) tantan: tantan 13 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: emapper.py not installed ERROR: ete3 not installed ERROR: gmes_petap.pl not installed ERROR: signalp not installed

Thanks for the tool. Athul

nextgenusfs commented 3 years ago

Sorry -- that was a dumb mistake. Image is rebuilding now, it should be ready in ~ 1 hour.

nextgenusfs commented 3 years ago

Okay it should be up on docker hub, pull to get the update.

athulmenon commented 3 years ago

Hi,

Thanks for the fix. The test ran successfully. But when I used my data, an antismash error popped up. I am not sure if it is an issue with the update or my antismash results. Antismash results were generated using the funannotate "remote" module. Can you please check.

`./funannotate-docker annotate --genbank FusariumGraminearum_GCF_000240135.3_ASM24013v3_genomic.gbk -o fusarium_graminearum_annotation --eggnog /FusariumGraminearum/eggnog_results/query_seqs.fa.emapper.annotations --antismash /FusariumGraminearum/antismash_results/fungi-69ba57f6-ccf0-4f8f-b3d6-d306e2ac70a7/FusariumGraminearum_GCF_000240135.3_ASM24013v3_genomic.gbk --iprscan /FusariumGraminearum/ipr_results.xml --cpus 10 logname: no login name logname: no login name

[Jan 29 04:37 PM]: OS: Debian GNU/Linux 10, 12 cores, ~ 74 GB RAM. Python: 3.7.9 [Jan 29 04:37 PM]: Running 1.8.4 [Jan 29 04:37 PM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt' [Jan 29 04:37 PM]: Checking GenBank file for annotation Skipped 11 annotations: 11 pseudo genes; 0 no CDS; 0 duplicated features [Jan 29 04:38 PM]: Adding Functional Annotation to Fusarium graminearum PH-1, NCBI accession: WGS:DS23 [Jan 29 04:38 PM]: Annotation consists of: 13,726 gene models [Jan 29 04:38 PM]: 13,313 protein records loaded [Jan 29 04:38 PM]: Running HMMer search of PFAM version 33.1 [Jan 29 04:46 PM]: 13,595 annotations added [Jan 29 04:46 PM]: Running Diamond blastp search of UniProt DB version 2020_06 [Jan 29 04:48 PM]: 1,002 valid gene/product annotations from 1,377 total [Jan 29 04:48 PM]: Existing Eggnog-mapper results found: fusarium_graminearum_annotation/annotate_misc/eggnog.emapper.annotations [Jan 29 04:48 PM]: Parsing EggNog Annotations [Jan 29 04:48 PM]: 7,569 COG and EggNog annotations added [Jan 29 04:48 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.65 [Jan 29 04:48 PM]: 1,002 gene name and product description annotations added [Jan 29 04:48 PM]: Running Diamond blastp search of MEROPS version 12.0 [Jan 29 04:48 PM]: 404 annotations added [Jan 29 04:48 PM]: Annotating CAZYmes using HMMer search of dbCAN version 9.0 [Jan 29 04:50 PM]: 523 annotations added [Jan 29 04:50 PM]: Annotating proteins with BUSCO dikarya models [Jan 29 04:52 PM]: 1,272 annotations added [Jan 29 04:52 PM]: Skipping phobius predictions, try funannotate remote -m phobius [Jan 29 04:52 PM]: Skipping secretome: neither SignalP nor Phobius searches were run [Jan 29 04:52 PM]: 0 secretome and 0 transmembane annotations added [Jan 29 04:53 PM]: Parsing InterProScan5 XML file [Jan 29 04:56 PM]: Now parsing antiSMASH v6 results, finding SM clusters Traceback (most recent call last): File "/venv/bin/funannotate", line 713, in main() File "/venv/bin/funannotate", line 703, in main mod.main(arguments) File "/venv/lib/python3.7/site-packages/funannotate/annotate.py", line 989, in main AntiSmashannotations) File "/venv/lib/python3.7/site-packages/funannotate/library.py", line 6935, in ParseAntiSmash numericalContig = int(record.id.rsplit('', 1)[-1]) ValueError: invalid literal for int() with base 10: '026474.1' `

Thank you. Athul

nextgenusfs commented 3 years ago

Would you be able to send me link to antiSMASH results or the antiSMASH GBK file? Seems related to how it expected the fasta headers to be named, what is your header naming scheme?

athulmenon commented 3 years ago

Please find the .gbk file from the link. https://drive.google.com/file/d/1XKLk5Q615Iptk5EeJFhg0aUH9djy3bk0/view?usp=sharing

The fasta header naming inside antismash results folder is ">FGSG_00001-T1 FGSG_00001"

The fasta header of protein sequence downloaded from NCBI is ">XP_011315562.1 hypothetical protein FGSG_11579 [Fusarium graminearum PH-1]" Hope this helps.

nextgenusfs commented 3 years ago

Thanks -- its due to the NCBI scaffold names -- did you also get a link to the web results from antiSMASH? I'd like to see how they are enumerating the clusters, it might have changed since this is now a new version (6). Previously if your contig was named chr1, then it would have called the first cluster named 1.1 and the second would be 1.2, etc. Since apparently bio python is parsing the record.id for this scaffold as NC_026474.1, its choking on the .1. So if I can see how antiSMASH is presenting this I can do the same.

nextgenusfs commented 3 years ago

So if I simply strip off the .1 which corresponds in these old assemblies to a version number, you'd get a result like this:

NC_026474.1 160289  217648  Cluster_26474.1 0   +
NC_026474.1 1436186 1451304 Cluster_26474.2 0   +
NC_026474.1 5548489 5588101 Cluster_26474.3 0   +
NC_026474.1 5573182 5609251 Cluster_26474.4 0   +
NC_026474.1 5856809 5873522 Cluster_26474.5 0   +
NC_026474.1 5861438 5902658 Cluster_26474.6 0   +
NC_026474.1 7285342 7327399 Cluster_26474.7 0   +
NC_026474.1 7452834 7514009 Cluster_26474.8 0   +
NC_026474.1 7503245 7544494 Cluster_26474.9 0   +
NC_026474.1 7683951 7728756 Cluster_26474.10    0   +
NC_026474.1 7695583 7746953 Cluster_26474.11    0   +
NC_026474.1 9783754 9803449 Cluster_26474.12    0   +
NC_026474.1 10707528    10728727    Cluster_26474.13    0   +
NC_026474.1 10863841    10906818    Cluster_26474.14    0   +
NC_026474.1 10875350    10920861    Cluster_26474.15    0   +
NC_026474.1 11089281    11130019    Cluster_26474.16    0   +
NC_026474.1 11159537    11224335    Cluster_26474.17    0   +
NC_026474.1 11196940    11244849    Cluster_26474.18    0   +
NC_026474.1 11406232    11449541    Cluster_26474.19    0   +
NC_026475.1 573352  613635  Cluster_26475.1 0   +
NC_026475.1 2059829 2101313 Cluster_26475.2 0   +
NC_026475.1 2245996 2280024 Cluster_26475.3 0   +
NC_026475.1 2561027 2612669 Cluster_26475.4 0   +
NC_026475.1 2575165 2621917 Cluster_26475.5 0   +
NC_026475.1 2644785 2686385 Cluster_26475.6 0   +
NC_026475.1 2669744 2686385 Cluster_26475.7 0   +
NC_026475.1 3484779 3505484 Cluster_26475.8 0   +
NC_026475.1 3538428 3582546 Cluster_26475.9 0   +
NC_026475.1 3542871 3555980 Cluster_26475.10    0   +
NC_026475.1 5508387 5553529 Cluster_26475.11    0   +
NC_026475.1 6060982 6107254 Cluster_26475.12    0   +
NC_026475.1 6097187 6140276 Cluster_26475.13    0   +
NC_026475.1 6638479 6658431 Cluster_26475.14    0   +
NC_026475.1 6753340 6773689 Cluster_26475.15    0   +
NC_026475.1 7152432 7200957 Cluster_26475.16    0   +
NC_026475.1 7380657 7423933 Cluster_26475.17    0   +
NC_026475.1 7915743 7937760 Cluster_26475.18    0   +
NC_026476.1 9801    45452   Cluster_26476.1 0   +
NC_026476.1 2024300 2076412 Cluster_26476.2 0   +
NC_026476.1 3384770 3434831 Cluster_26476.3 0   +
NC_026476.1 4098102 4136177 Cluster_26476.4 0   +
NC_026476.1 5906380 5956888 Cluster_26476.5 0   +
NC_026476.1 6019166 6099885 Cluster_26476.6 0   +
NC_026476.1 7255232 7302481 Cluster_26476.7 0   +
NC_026476.1 7456514 7478171 Cluster_26476.8 0   +
NC_026476.1 7682802 7694587 Cluster_26476.9 0   +
NC_026477.1 49741   70044   Cluster_26477.1 0   +
NC_026477.1 86520   130134  Cluster_26477.2 0   +
NC_026477.1 260243  300046  Cluster_26477.3 0   +
NC_026477.1 630873  663475  Cluster_26477.4 0   +
NC_026477.1 4151999 4171516 Cluster_26477.5 0   +
NC_026477.1 4497434 4549293 Cluster_26477.6 0   +
NC_026477.1 4498286 4546990 Cluster_26477.7 0   +
NC_026477.1 6668266 6689813 Cluster_26477.8 0   +
NC_026477.1 7275310 7321978 Cluster_26477.9 0   +

But I'm not sure if in antiSMASH v6 if these are how the cluster names appear in the html output or not, ie Cluster_26477.9 is that what the last cluster is called?

athulmenon commented 3 years ago

Hi Jon, Thanks for looking into the issue. Please find the link to antismash html output. https://fungismash.secondarymetabolites.org/upload/fungi-69ba57f6-ccf0-4f8f-b3d6-d306e2ac70a7/index.html

Hope this helps. Athul

athulmenon commented 3 years ago

Hi Jon, Please let me know if I can pull the updated image if you have fixed the issue. I will run and let you know.

I have one more query, I want to add SignalP db into my annotation, how can I include it to the present docker wrapper. Thanks for the support. Athul

nextgenusfs commented 3 years ago

It should be up now. Because antiSMASH has changed how they display on website to regions that are composed of multiple clusters I'm no longer going to try to match that result with the names. But I think the parsing error is fixed.

You will need to create a new docker image and install signalP in the image - I can't include it due to licensing reasons.

athulmenon commented 3 years ago

Hi Jon,

Thanks for the fix. It worked without any errors. I tried to ran the compare module with the .gbk files, it ran without any error, but there are some warnings which I would like to bring into your attention.

[Feb 03 06:41 PM]: OS: Debian GNU/Linux 10, 12 cores, ~ 74 GB RAM. Python: 3.7.9 [Feb 03 06:41 PM]: Running 1.8.4 [Feb 03 06:41 PM]: Now parsing 2 genomes [Feb 03 06:41 PM]: working on Fusarium equiseti [Feb 03 06:42 PM]: working on Fusarium oxysporum f. sp. lycopersici 4287 [Feb 03 06:42 PM]: Summarizing secondary metabolism gene clusters [Feb 03 06:43 PM]: Summarizing PFAM domain results [Feb 03 06:43 PM]: Summarizing InterProScan results [Feb 03 06:43 PM]: Loading InterPro descriptions [Feb 03 06:43 PM]: Summarizing MEROPS protease results [Feb 03 06:43 PM]: found 41/96 MEROPS familes with stdev >= 1.000000 /venv/lib/python3.7/site-packages/funannotate/library.py:7865: MatplotlibDeprecationWarning: Calling add_axes() without argument is deprecated since 3.3 and will be removed two minor releases later. You may want to use add_subplot() instead. cbar_ax = fig.add_axes(shrink=0.4) [Feb 03 06:43 PM]: Summarizing CAZyme results [Feb 03 06:43 PM]: found 59/144 CAZy familes with stdev >= 1.000000 /venv/lib/python3.7/site-packages/funannotate/library.py:7865: MatplotlibDeprecationWarning: Calling add_axes() without argument is deprecated since 3.3 and will be removed two minor releases later. You may want to use add_subplot() instead. cbar_ax = fig.add_axes(shrink=0.4) [Feb 03 06:43 PM]: No COG annotations found [Feb 03 06:43 PM]: No SignalP annotations found [Feb 03 06:43 PM]: Summarizing fungal transcription factors /venv/lib/python3.7/site-packages/funannotate/library.py:7865: MatplotlibDeprecationWarning: Calling add_axes() without argument is deprecated since 3.3 and will be removed two minor releases later. You may want to use add_subplot() instead. cbar_ax = fig.add_axes(shrink=0.4) [Feb 03 06:43 PM]: Running GO enrichment for each genome WARNING: skipping Fusarium_oxysporum_f._sp._lycopersici_4287.txt as no GO terms /venv/lib/python3.7/site-packages/funannotate/compare.py:803: FutureWarning: The default value of regex will change from True to False in a future version. df.columns = df.columns.str.replace(r'^# ', '') [Feb 03 06:45 PM]: Running orthologous clustering tool, ProteinOrtho. This may take awhile... [Feb 03 06:51 PM]: Compiling all annotations for each genome [Feb 03 06:51 PM]: Skipping RAxML phylogeny as at least 4 taxa are required [Feb 03 06:51 PM]: Compressing results to output file: compare_out.tar.gz [Feb 03 06:52 PM]: Funannotate compare completed successfully!

Thanks for the support. Regards, Athul