Open bismarck1008 opened 9 hours ago
Hi, This error
Error: (CException::eUnknown) Too many protein hits to proks (GP-23178)
Error: (106.16) Application's execution failed (CException::eUnknown) Too many protein hits to proks (GP-23178)
means your genome assembly is contaminated with prokaryote sequences. Please run FCS on it first to clean it up. See https://github.com/ncbi/fcs/wiki
Pooja
Thank you very much. But if I type in the class number of the odd-footed species, it works. And the genome uses data from the NCBI database, so there should be no contamination.
Hi, This error
Error: (CException::eUnknown) Too many protein hits to proks (GP-23178) Error: (106.16) Application's execution failed (CException::eUnknown) Too many protein hits to proks (GP-23178)
means your genome assembly is contaminated with prokaryote sequences. Please run FCS on it first to clean it up. See https://github.com/ncbi/fcs/wiki
Pooja
Thank you very much. But if I type in the class number of the odd-footed species, it works. And the genome uses data from the NCBI database, so there should be no contamination.
python3 /data/bio-software/egapx/ui/egapx.py ../data/input_D_farinae_small.local5.yaml -e docker -o GCA_040085125.1_ASM4008512v1_out2 -w ./temp
[6f/72e38f] process > egapx:annot_proc_plane:gnomon_biotype:run_gnomon_biotype [100%] 4 of 4, failed: 4, retries: 3 ✘
ERROR ~ Error executing process > 'egapx:annot_proc_plane:gnomon_biotype:run_gnomon_biotype'
Caused by: Process
egapx:annot_proc_plane:gnomon_biotype:run_gnomon_biotype
terminated with an error exit status (3)Command executed:
mkdir -p output mkdir -p ./asncache/ prime_cache -cache ./asncache/ -ifmt asnb-seq-entry -i swissprot.asnb -oseq-ids spids -split-sequences prime_cache -cache ./asncache/ -ifmt asnb-seq-entry -i gnomon_wnode.out -oseq-ids gnids -split-sequences lds2_indexer -source genome/ -db LDS2 echo "hits.diamond.asn" > raw_blastp_hits.mft merge_blastp_hits -asn-cache ./asncache/ -nogenbank -lds2 LDS2 -input-manifest raw_blastp_hits.mft -o prot_hits.asn echo "gnomon_wnode.out" > models.mft echo "prot_hits.asn" > prot_hits.mft echo "" > splices.mft if [ -z "" ] then gnomon_biotype -gc gencoll.asn -asn-cache ./asncache/ -lds2 ./LDS2 -nogenbank -gnomon_models models.mft -o output/biotypes.tsv -o_prots_rpt output/prots_rpt.tsv -prot_hits prot_hits.mft -prot_splices sp lices.mft -reftrack-server 'NONE' -allow_lt631 true else gnomon_biotype -gc gencoll.asn -asn-cache ./asncache/ -lds2 ./LDS2 -nogenbank -gnomon_models models.mft -o output/biotypes.tsv -o_prots_rpt output/prots_rpt.tsv -prot_denylist -prot_hits prot_hits.mft -prot_splices splices.mft -reftrack-server 'NONE' -allow_lt631 true fi
Command exit status: 3
Command output: (empty)
Command error: Prefetching 4358 bioseqs Prefetching 4605 bioseqs Prefetching 5119 bioseqs Prefetching 5021 bioseqs Prefetching 4387 bioseqs Prefetching 4726 bioseqs Prefetching 4029 bioseqs Prefetching 4442 bioseqs Prefetching 4865 bioseqs Prefetching 3001 bioseqs Prefetching 4831 bioseqs Prefetching 4484 bioseqs Prefetching 2711 bioseqs Prefetching 1272 bioseqs Prefetching 1170 bioseqs Second-pass: computing bestness scores
Starting. Fetching Gnomon model data. Loading GC-Assembly. Taxon is invertebrate or plant - will allow more coding models Loading protein hits Skipped 19932 protein hits without corresponding CDS features Processed 274631 hits; accepted 141526; 24500 are RBPH Loading protein data. Retrieving attributes for 43024 prots Fetching next batch of 10000 Fetching next batch of 10000 Fetching next batch of 10000 Fetching next batch of 10000 Creating classifier.
Classifier internal state for EGAPx Test Assembly: 0: 907233/264=3436.49 907233/1442=629.149 1: 1.06183e+06/3514=302.172 1.06183e+06/7267=146.117 M=[730 326; 398 2826]; PPV=0.64659; NPV=0.896289; ACC=0.830647
Allowing locusType-631 models: true Initialized 10 patterns for attr_rule=538. Initialized 36 patterns for attr_rule=489. Initialized 6 patterns for attr_rule=989. Initialized 11 patterns for attr_rule=986. Initialized 6 patterns for attr_rule=987. Initialized 5 patterns for attr_rule=988. Outputting. Initialized 70 patterns for attr_rule=869. BPH to proks: 5.88253% Error: (CException::eUnknown) Too many protein hits to proks (GP-23178) Error: (106.16) Application's execution failed (CException::eUnknown) Too many protein hits to proks (GP-23178)
Work dir: /data/dell/CNI.2024.10.5/2.anotation/test/temp/6f/72e38fd113038b2726553d1e7b22e0
Tip: you can replicate the issue by changing to the process work dir and entering the command
bash .command.run
-- Check 'GCA_040085125.1_ASM4008512v1_out2/nextflow.log' file for details