ncbi / pgap

NCBI Prokaryotic Genome Annotation Pipeline
Other
310 stars 90 forks source link

errors in annotation of bacterial whole genome #134

Closed Biomedinformatics closed 3 years ago

Biomedinformatics commented 3 years ago

I have paired-end fastq files for bacterial genome. I trimmed it using trim_galore to get rid of adaptors and low quality reads. Then performed assembly using spades. Finally when I tried to annotate using PGAP it gives error and calls.tab file shows : lcl|NODE_192_length_376_cov_0.943144 M 138..299 adaptor:multiple Adaptor
lcl|NODE_1390_length_253_cov_0.823864 M 204..253 adaptor:NGB01064.1 Adaptor
lcl|NODE_1563_length_248_cov_0.812865 M 88..121 adaptor:NGB00749.1 Adaptor
lcl|NODE_1590_length_247_cov_0.852941 M 216..247 adaptor:NGB00749.1 Adaptor
lcl|NODE_2102_length_234_cov_0.929936 X - adaptor:multiple Adaptor

why is it showing adaptor when I already trimmed it and how can I get rid of the contaminated reads from fastq files?

azat-badretdin commented 3 years ago

Thank you for your report, user Biomedinformatics.

why is it showing adaptor when I already trimmed it and how can I get rid of the contaminated reads from fastq files?

The expectation is that you either remove the contigs from your input FASTA files or edit the specified regions out.

If you would like to contest the choice of adaptor or contaminant sequences, feel free to examine our databases for adaptors that we provide as BLAST database and FASTA file of adaptors and contaminants as part of the reference package.

You should have in your directory something like input* subdirectory. It should contain two things: dir contam_in_prok_blastdb_dir with BLASTdb indexes and adaptor_fasta.fna file.

Please let me know if this helps.

Biomedinformatics commented 3 years ago

@azat-badretdin Thank you for your reply. I have removed those contigs showing contamination and now PGAP is running (yet to complete). I just want to know if this Assembly can be submitted to NCBI or do I need to work on fastq files and again assemble or only removing contaminated contigs is enough?

azat-badretdin commented 3 years ago

I just want to know if this Assembly can be submitted to NCBI

If it successfully completes, I do not see why not. Disclaimer: I know little about SOPs of submission unit in GenBank.

thibaudnis commented 3 years ago

Azat is correct. If you have removed or replaced with Ns the contaminated spans you can submit to GenBank the assembly fasta and the .sqn file that PGAP produces .

Biomedinformatics commented 3 years ago

Thank you for all your support. PGAP is completed successfully.

azat-badretdin commented 3 years ago

You are very welcome! Thank you for reporting the issue!