ncbi / rapt

Read Assembly and Annotation Pipeline Tool
55 stars 17 forks source link

PGAP LTP issue and --auto-correct-tax #19

Open arunprasanna83 opened 11 months ago

arunprasanna83 commented 11 months ago

Hello, I assembled a genome with SPAdes --> checkM. CheckM showed it as Rhizobiales with 99.6% completeness. But ANI identified it as Altermonas.

I thought of trying RAPT in one go in local workstation.

./run_rapt.py -q R1.fq.gz,R2.fq.gz --organism "Alteromonas" --strain "AK250" -o RAFT_testout --auto-correct-tax -c 64

The number of contigs, N50 is same as SPAdes assembly. However, ANI report shows the following:

Submitted organism: Alteromonas (taxid = 226, rank = genus, lineage = Bacteria; Pseudomonadota; Gammaproteobacteria; Alteromonadales; Alteromonadaceae; Alteromonas/Salinimonas group) Predicted organism: Martelella lutilitoris (taxid = 2583532, rank = species, lineage = Bacteria; Pseudomonadota; Alphaproteobacteria; Hyphomicrobiales; Aurantimonadaceae; Martelella)

Also, auto-correct-tax did not override the Alteromonas to Matelella in the final output.

The completeness is also low with 49.6%

Another question is: How to provide LTP prefix for PGAP?

Error: no LTP specified, locus tag prefix 'pgaptmp' will be used

I would like to provide a desired prefix.

Thanks in advance.

thibaudnis commented 11 months ago

Hi Arun - thank you for providing this report.

Also, auto-correct-tax did not override the Alteromonas to Matelella in the final output.

The ANI report you provided is truncated. What are the two following lines (Status and Confidence)? If the Confidence is LOW, the logic doesn't reassign the genome to the predicted organism.

There is no option to provide a locus tag prefix to RAPT. However, since you have already assembled the genome, you can run PGAP (https://github.com/ncbi/pgap/wiki) which will allow you to provide an LTP in the input yaml file.

arunprasanna83 commented 11 months ago

Hi Arun - thank you for providing this report.

Also, auto-correct-tax did not override the Alteromonas to Matelella in the final output.

The ANI report you provided is truncated. What are the two following lines (Status and Confidence)? If the Confidence is LOW, the logic doesn't reassign the genome to the predicted organism.

There is no option to provide a locus tag prefix to RAPT. However, since you have already assembled the genome, you can run PGAP (https://github.com/ncbi/pgap/wiki) which will allow you to provide an LTP in the input yaml file.

Yes you are right. The confidence was low

Status: INCONCLUSIVE Confidence: LOW

I also observed that --organism "Alteromonas" option makes checkM output only the number of markers assigned to Altermonas. In case, I am not sure or assumed it to be Altermonas, is it possible to specify --organism "Bacteria" to ask ANI to predict the organism and checkM for completeness?

Thanks.

thibaudnis commented 11 months ago

Based on the data you report here, there is more evidence that the genome belongs to Martelella lutilitoris (a Rhizobiales) rather than Alteromonas, but ANI is not able to provide confirmation. I would try running PGAP with --organism "Martelella lutilitoris".