Closed MostafaYA closed 5 months ago
Could you post the files CD_21S0467-D02.faa, CD_21S0467-D02.fasta and CD_21S0467-D02.gff3? Or at least the cfr(C) protein in CD_21S0467-D02.faa mapping on contig00013 from 30975 to 32063?
Hi
here is the ànnotation info from bakta
. Would you mind sending the files to you per email (fe here: pd-help@ncbi.nlm.nih.gov). The reason is that the data is not mine!
contig00013 Prodigal CDS 30975 32225 . + 0 ID=ILDIMB_14890;Name=23S rRNA (adenine(2503)-C(8))-methyltransferase Cfr;locus_tag=ILDIMB_14890;product=23S rRNA (adenine(2503)-C(8))-methyltransferase Cfr;Dbxref=RefSeq:WP_021434980.1,SO:0001217,UniParc:UPI00038CBF9B,UniRef:UniRef100_A0A417SUK8,UniRef:UniRef50_A0A3B0CKZ1,UniRef:UniRef90_A0A1Q1PTQ2;gene=cfr
Hi Mostafa,
Yes, please send the files to pd-help@ncbi.nlm.nih.gov, and mention GitHub issue 140 in the email so I can make sure I see it. We can check them there. My guess is that there is an HMM hit that is suppressing the reporting of the gene once AMRFinderPlus is able to run HMMER to match the proteins, but that's just a guess.
Thanks, Arjun
I just sent the files- Let’s me know if something else is needed! Thanks in advance
Hi Mostafa,
Hopefully you saw my response to your email, but I will post it publicly here as well for the record.
The non-call of the cfr(C) was caused because of a somewhat obscure rule in the extensive ruleset of AMRFinderPlus where, if a query protein hits by blast a reference protein at less than 98% identity it must also be a hit above the cutoff for an HMM (if any) at a higher level in the hierarchy. We have described this in our papers, but sometimes even we forget all the details.
This particular rule was created early AMRFinder development to avoid false call edge cases. We are reviewing if the rule is something we should consider changing or dropping.
However, what has happened in your specific case is a new divergent sequence and node (cfr(C)) with a curated blast rule was later added as a child of the cfr_gen node, and the new cfr(C) sequence is different enough that the parent node HMM should have been reviewed and updated to broaden its scope. See https://www.ncbi.nlm.nih.gov/pathogens/genehierarchy/#cfr*%20OR%20cipA%20OR%20clb* to see the current hierarchy and HMM at the cfr_gen node. The HMM was reviewed and modified and the next AMRFinderPlus database release will make the cfr(C) protein you saw reported in combined mode as well as nucleotide-only mode.
Sorry again for the delayed responses and thank you very much for pointing this out. It's an edge case we hadn't caught and the particular set of circumstances required to reveal it mean we might not have noticed it for quite some time if we hadn't gotten your report.
I will close this ticket once we release a new AMRFinderPlus database that fixes this issue.
Arjun
Thanks a lot
Hi Mostafa,
We just released the AMRFinderPlus database version 2024-05-02.2 and we reviewed a QC check we made to help identify these kinds of cases and made several changes in hierarchy structure and HMMs to hopefully prevent this from happening again for any other genes. We ended up removing the abc-f HMM that was causing this issue because we couldn't tune a single HMM at this level of the hierarchy to be both sensitive and specific.
Anyway, you should find the gene with both combined and nucleotide-only in this case, and going forward since we've added it to the suite of tests we run before every release.
Thanks again for reporting, Arjun
Hi, I am using amrfinderplus on a C. diff genome. I used same sample with and without prior annotation. However, the results were different (Please see below!). I cannot understand the absence of the cfr(C) from the results of the first command line.
Thanks for your help
Software version: 3.12.8 Database version: 2024-01-31.1