Open lerminin opened 5 months ago
What's happening is we have a process that explicitly looks for complex mutations (as defined in the complex mutations file) that replaces a set of individually reported point mutations with a single complex mutation if there's an appropriate match.
In your example with 18 Enterococcus faecium pbp5 point mutations, there aren't enough mutations to perform one of the replacements defined in the complex mutations file. So the 18 pbp5 point mutations remain listed individually, and information associated with each of them is pulled from the CGE Pointfinder Enterococcus faecium database and provided to the user of StarAMR.
The CGE Pointfinder database treats these 18 mutations the same as every other point mutation (listing Ampicillin
as the conferred resistance), but expects the user to read the entry in the Notes
column for these mutations, which read:
The nineteen pbp5 mutations must be present simultaneously for resistance phenotype
.
We show this note from CGE in StarAMR's pointfinder.tsv
output file, which would look something like this (showing 1 non-header row instead of many):
Isolate ID | Gene | Predicted Phenotype | CGE Predicted Phenotype Type | Position | Mutation | %Identity | %Overlap | HSP Length/Total Length | Contig | Start | End | Pointfinder Position | CGE Notes | CGE Required Mutation | CGE Mechanism | CGE PMID |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
pbp5_19_failure pbp5 (A216S) | unknown[pbp5 (A216S)] | Ampicillin | codon | 216 | GCA -> AGT (A -> S) | 98.38 | 100.00 | 2037/2037 | pbp5_1_AAK43724.1 | 1 | 2037 | A216S | The nineteen pbp5 mutations must be present simultaneously for resistance phenotype | Target modification | 25182648 |
So in this case we're showing the CGE-predicted phenotype as ampicillin
because that's what CGE reports for these exact mutations. We show the note that CGE provides alongside these predictions in StarAMR's pointfinder.tsv
output file, but I believe not in detailed_summary.tsv
to have less clutter in this file. The detailed summary could be changed to show the CGE Notes
column though.
In your example with 19 Enterococcus faecium pbp5 point mutations, there are enough mutations to perform one of the replacements, so the 19 pbp5 mutations are replaced with a single complex mutation. These complex mutations are defined in the complex mutations file. The resulting complex mutation does not list ampicillin as a CGE Predicted Phenotype
because CGE does not predict the complex mutation the same way as we are predicting them. CGE predicts the complex mutations as:
The nineteen pbp5 mutations must be present simultaneously for resistance phenotype
In this case, meaning exactly the following mutations:
With (to my interpretation) V586L being non-essential:
pbp5 pbp5 586 GTA V L Ampicillin 25182648 Target modification Not essential for resistance phenotype
Whereas our complex mutations file defines either:
or
as mandatory and then grabs all other related mutations and collapses them into a single complex mutation file.
Importantly, the CGE Pointfinder database does NOT make the same predictions we use in the complex mutations file (3 vs 19), so it would be wrong to list ampicillin
under the CGE Predicted Phenotype
column in your example. Instead, we list ampicillin
in the adjacent Predicted Phenotype
column in StarAMR's detailed_summary.tsv
output file.
StarAMR is trying to wrap around CGE's databases (Pointfinder in this case) and is trying to relay this information back to the user of StarAMR. The complex mutation functionality implemented by StarAMR is an attempt to allieviate some issues with how CGE reports this data by default (expecting users to interpret it themselves).
In your example with the complex mutation replacement, CGE is not predicting the mutations we've manually defined in the complex mutations file, so we report it under the Predicted Phenotype
column instead of the CGE Predicted Phenotype
in the detailed_summary.tsv
output file. There is maybe an argument that if the exact 19 mutations specified by CGE are present, then in that specific case ampicillin
should appear under the CGE Predicted Phenotype
, but not in the case of the other predicted complex mutations. However, this is adding an exception to the exception, so I think it would have to be a priority.
In your example with the incomplete complex mutation, CGE reports these mutations like any other, showing Ampicillin
under the Resistance
column, but expects the user to read to the Notes
column. We show CGE's notes in StarAMR's pointfinder.tsv
output file but not the detailed_summary.tsv
output file. There's a possibility to hard code these mutations to never be reported unless a complex set of them is found, but again, it's writing yet another exception to CGE's Pointfinder data (which has many), so I think it would have to be considered a priority to make such a change.
There are of course solutions to these problems, but a risk with writing many specific exceptions and special cases is that it makes it harder to update the databases used by StarAMR when CGE updates the databases on their end. It's ultimately matter of priorities though.
Hello,
I have two Enterococcus faecium assemblies that I have run through staramr v0.10.0 using the default databases with
--pointfinder-organism enterococcus_faecium --pid-threshold 90
. Assembly1 has all 19 pbp5 mutations that contribute to ampicillin resistance according to the CGE key. Assembly2 has 18/19 mutations, and is missing the pbp5 (P667S) mutation, which from my understanding is a mandatory mutation listed in the complex pbp5 mutations file.It's not clear to me why the CGE phenotype column for assembly1 is left blank when according to the CGE key these 19 mutations are required for ampicillin resistance, and yet the CGE phenotype column is populated "Ampicillin" for assembly2 when one of the essential mutations is missing. It seems the Predicted Phenotype column is calling them as I would expect: "ampicillin" for assembly1, and "unknown[pbp5 (mutation)]" for assembly 2.