ncbi / pgap

NCBI Prokaryotic Genome Annotation Pipeline
Other
294 stars 89 forks source link

[BUG] <title>The length of the gene exceeds the length of the conitg #282

Closed Moo-cow closed 5 months ago

Moo-cow commented 7 months ago

Describe the bug The length of the gene exceeds the length of the conitg NODE_952_length_259_cov_0.802198 Local region 1 259 . + . ID=NODE_952_length_259_cov_0.802198:1..259;Dbxref=taxon:562;Is_circular=true;Name=ANONYMOUS;gbkey=Src;genome=chromosome;mol_type=genomic DNA;strain=IW NODE_952_length_259_cov_0.802198 . gene 71 319 . - . ID=gene-pgaptmp_004397;Name=pgaptmp_004397;gbkey=Gene;gene_biotype=protein_coding;locus_tag=pgaptmp_004397 NODE_952_length_259_cov_0.802198 GeneMarkS-2+ CDS 71 319 . - 0 ID=cds-pgaptmp_004397;Parent=gene-pgaptmp_004397;Name=extdb:pgaptmp_004397;gbkey=CDS;inference=COORDINATES: ab initio prediction:GeneMarkS-2+;locus_tag=pgaptmp_004397;product=hypothetical protein;protein_id=extdb:pgaptmp_004397;transl_table=11 To Reproduce Whether the genome is too fragmented to use pgap annotation

Software versions (please complete the following information):

azat-badretdin commented 7 months ago

Thank you for your report, user @Moo-cow !

Would you like to try to reproduce this with the last version of PGAP? We had a release in October.

Moo-cow commented 7 months ago

Hi~ I updated the software and database to 2023-10-03.build7061. The problem remains. It looks like a problem for small conti.

azat-badretdin commented 7 months ago

It looks like a problem for small conti.

We do have a check for those. Small contigs could be either removed, or you can push through using ./pgap.py --ignore-all-errors

Moo-cow commented 7 months ago

Hi~ I try to add --ignore-all-errors, This doesn't work. The gene exceeds the length of the contig, these mistakes still exist

azat-badretdin commented 7 months ago

Could you please post cwltool.log?

if it mentions contig name could you please post the FASTA header for this contig?

Please post your submol.yaml file as well if you ran it with YAML file as a parameter.

Thanks

Moo-cow commented 7 months ago

The contig:

NODE_67_length_271_cov_0.752577 CTCGGCGGTGACCTGATAGACCATGCCGACCAGCAGGATGGCCAGGGTGAAACAGCCGAT CAGCAGCAGATAGAACTGGATAAACAGGCGGCGCATTGGGCTCTCCTCTTGCCTGGATTA CGTCTGGGCATTGCCTTGCTGCCCCCTCATCCCCAACCCTTCTCCCGCAAGGGGAGAAGG GAGCAGAGAGCATCAAGGTCATTCGGCGGGGCTTATTCCCAGGCCTGGGGTACTAACAGG TAGCCCTTCTGGCGCACTGTCTTGATGCGGG

The anno:

sequence-region 1 271

species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=244366

NODE_67_length_271_cov_0.752577 Local region 1 271 . + . ID=NODE_67_length_271_cov_0.752577:1..271;Dbxref=taxon:244366;Is_circular=true;Name=ANONYMOUS;gbkey=Src;genome=chromosome;mol_type=genomic DNA;strain=None NODE_67_length_271_cov_0.752577 . gene 233 367 . - . ID=gene-LWNFPK02_4_005574;Name=LWNFPK02_4_005574;gbkey=Gene;gene_biotype=protein_coding;locus_tag=LWNFPK02_4_005574 NODE_67_length_271_cov_0.752577 GeneMarkS-2+ CDS 233 367 . - 0 ID=cds-LWNFPK02_4_005574;Parent=gene-LWNFPK02_4_005574;Name=extdb:LWNFPK02_4_005574;gbkey=CDS;inference=COORDINATES: ab initio prediction:GeneMarkS-2+;locus_tag=LWNFPK02_4_005574;product=hypothetical protein;protein_id=extdb:LWNFPK02_4_005574;transl_table=11

submol.yaml file:

topology: 'circular' organism: genus_species: 'Klebsiella variicola' strain: 'None' locus_tag_prefix: 'LWNFPK024'

cwltool.log in the attachment.

Thanks for your time.

At 2023-12-19 20:11:36, "Azat Badretdin" @.***> wrote:

Could you please post cwltool.log?

if it mentions contig name could you please post the FASTA header for this contig?

Please post your submol.yaml file as well if you ran it with YAML file as a parameter.

Thanks

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

从网易163邮箱发来的超大附件推荐客户端极速下载 cwltool.log (54.87M, 2024年1月4日 14:00 到期) 下载

arunprasanna83 commented 6 months ago

I have the same issue too!

azat-badretdin commented 6 months ago

cwltool.log

It has not been attached.

Also please make sure that your small contigs are not marked circular by accident.

arunprasanna83 commented 5 months ago

Can you please give the cut-off or threshold to define the size of contigs? i.e. which length should be classified as small or large?

azat-badretdin commented 5 months ago

200 bases