ncbi / egapx

Eukaryotic Genome Annotation Pipeline-External caller scripts and documentation
Other
89 stars 9 forks source link

ERROR ~ Error executing process > 'egapx:annot_proc_plane:gnomon_biotype:run_gnomon_biotype' #61

Open bismarck1008 opened 9 hours ago

bismarck1008 commented 9 hours ago

python3 /data/bio-software/egapx/ui/egapx.py ../data/input_D_farinae_small.local5.yaml -e docker -o GCA_040085125.1_ASM4008512v1_out2 -w ./temp

[6f/72e38f] process > egapx:annot_proc_plane:gnomon_biotype:run_gnomon_biotype [100%] 4 of 4, failed: 4, retries: 3 ✘

ERROR ~ Error executing process > 'egapx:annot_proc_plane:gnomon_biotype:run_gnomon_biotype'

Caused by: Process egapx:annot_proc_plane:gnomon_biotype:run_gnomon_biotype terminated with an error exit status (3)

Command executed:

mkdir -p output mkdir -p ./asncache/ prime_cache -cache ./asncache/ -ifmt asnb-seq-entry -i swissprot.asnb -oseq-ids spids -split-sequences prime_cache -cache ./asncache/ -ifmt asnb-seq-entry -i gnomon_wnode.out -oseq-ids gnids -split-sequences lds2_indexer -source genome/ -db LDS2 echo "hits.diamond.asn" > raw_blastp_hits.mft merge_blastp_hits -asn-cache ./asncache/ -nogenbank -lds2 LDS2 -input-manifest raw_blastp_hits.mft -o prot_hits.asn echo "gnomon_wnode.out" > models.mft echo "prot_hits.asn" > prot_hits.mft echo "" > splices.mft if [ -z "" ] then gnomon_biotype -gc gencoll.asn -asn-cache ./asncache/ -lds2 ./LDS2 -nogenbank -gnomon_models models.mft -o output/biotypes.tsv -o_prots_rpt output/prots_rpt.tsv -prot_hits prot_hits.mft -prot_splices sp lices.mft -reftrack-server 'NONE' -allow_lt631 true else gnomon_biotype -gc gencoll.asn -asn-cache ./asncache/ -lds2 ./LDS2 -nogenbank -gnomon_models models.mft -o output/biotypes.tsv -o_prots_rpt output/prots_rpt.tsv -prot_denylist -prot_hits prot_hits.mft -prot_splices splices.mft -reftrack-server 'NONE' -allow_lt631 true fi

Command exit status: 3

Command output: (empty)

Command error: Prefetching 4358 bioseqs Prefetching 4605 bioseqs Prefetching 5119 bioseqs Prefetching 5021 bioseqs Prefetching 4387 bioseqs Prefetching 4726 bioseqs Prefetching 4029 bioseqs Prefetching 4442 bioseqs Prefetching 4865 bioseqs Prefetching 3001 bioseqs Prefetching 4831 bioseqs Prefetching 4484 bioseqs Prefetching 2711 bioseqs Prefetching 1272 bioseqs Prefetching 1170 bioseqs Second-pass: computing bestness scores

Starting. Fetching Gnomon model data. Loading GC-Assembly. Taxon is invertebrate or plant - will allow more coding models Loading protein hits Skipped 19932 protein hits without corresponding CDS features Processed 274631 hits; accepted 141526; 24500 are RBPH Loading protein data. Retrieving attributes for 43024 prots Fetching next batch of 10000 Fetching next batch of 10000 Fetching next batch of 10000 Fetching next batch of 10000 Creating classifier.

Classifier internal state for EGAPx Test Assembly: 0: 907233/264=3436.49 907233/1442=629.149 1: 1.06183e+06/3514=302.172 1.06183e+06/7267=146.117 M=[730 326; 398 2826]; PPV=0.64659; NPV=0.896289; ACC=0.830647

Allowing locusType-631 models: true Initialized 10 patterns for attr_rule=538. Initialized 36 patterns for attr_rule=489. Initialized 6 patterns for attr_rule=989. Initialized 11 patterns for attr_rule=986. Initialized 6 patterns for attr_rule=987. Initialized 5 patterns for attr_rule=988. Outputting. Initialized 70 patterns for attr_rule=869. BPH to proks: 5.88253% Error: (CException::eUnknown) Too many protein hits to proks (GP-23178) Error: (106.16) Application's execution failed (CException::eUnknown) Too many protein hits to proks (GP-23178)

Work dir: /data/dell/CNI.2024.10.5/2.anotation/test/temp/6f/72e38fd113038b2726553d1e7b22e0

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check 'GCA_040085125.1_ASM4008512v1_out2/nextflow.log' file for details

pstrope commented 9 hours ago

Hi, This error

Error: (CException::eUnknown) Too many protein hits to proks (GP-23178)
Error: (106.16) Application's execution failed (CException::eUnknown) Too many protein hits to proks (GP-23178)

means your genome assembly is contaminated with prokaryote sequences. Please run FCS on it first to clean it up. See https://github.com/ncbi/fcs/wiki

Pooja

bismarck1008 commented 6 hours ago

Thank you very much. But if I type in the class number of the odd-footed species, it works. And the genome uses data from the NCBI database, so there should be no contamination.

Hi, This error

Error: (CException::eUnknown) Too many protein hits to proks (GP-23178)
Error: (106.16) Application's execution failed (CException::eUnknown) Too many protein hits to proks (GP-23178)

means your genome assembly is contaminated with prokaryote sequences. Please run FCS on it first to clean it up. See https://github.com/ncbi/fcs/wiki

Pooja

Thank you very much. But if I type in the class number of the odd-footed species, it works. And the genome uses data from the NCBI database, so there should be no contamination.