Closed lgatto closed 1 year ago
Just to add to this that chimera was set to true for that run.
Good catch; the rank column is a late addition and this wasn't considered. As the code is currently written, this is the expected behavior when chimera is set to true - but perhaps not the correct behavior.
For self, easy fix by manually mutating rank here: https://github.com/lazear/sage/blob/0ca2d5d45aeb4748c786d0decbf5a906ccbb3621/crates/sage/src/scoring.rs#L457
I think documentation would go a long way here.
If I understand correctly
What would happen to the second matches if chimera was set to false? Would that spectrum not be identified at all; or if multiple PSMs were returned, they would simply be ranked together (as in case 1 above)?
It would actually also be helpful to easily sport chimeric spectra. It my understanding above is correct, I should be able to identify those as PSMs that have both rank 1 for the same scan. To help, it would be useful to have a 'chimera' column that identifies these candidates.
I think documentation would go a long way here.
If I understand correctly
* Different identifications from the same set of matched fragments would be ranked based on their score. * Chimeric identifications from different sets of fragments would get ranked independently and will thus have the same ranks.
What would happen to the second matches if chimera was set to false? Would that spectrum not be identified at all; or if multiple PSMs were returned, they would simply be ranked together (as in case 1 above)?
This is correct - there are only two scenarios where multiple IDs are returned for the same spectrum:
report_psms = N
where N > 1, in which we report back 1..N IDs against the same set of peaks, each with rank 1..Nchimera = True
, in which case we currently report both IDs as rank one (given that they are derived from different sets of peaks - the second/chimeric peptide is IDed against the parent spectrum with matching peaks removed)If chimera = true
and report_psms > 1
, then report_psms
is ignored, and chimeric search proceeds as normal
I am open to adding a chimera column (set to true for all second IDs?), but I wonder if it would be more useful to just report rank = 2
for the second/chimeric peptide
I think that chimeric scan and multiple PSMs for a scan are conceptually different. You enforce reporting a single PSM when setting chimera, but in theory (unlikely?), there could be multiple ranks for the different set of fragments. I had missed (or forgotten) that report_psms was ignored when searching for chimeric scans, so maybe, if the documentation is clear on that, the current situation is clear enough:
I think it is important to discriminate these two from the results themselves, without the need for configuration file (that might be missing when reporting data in a paper). If you were to change the rank in the first case, it would be more difficult to distinguish these.
Added DOCS.md
that explains the current sitatuion
I might be missing something here, but I was surprised to see two hits of rank 1 for the same scan:
Any idea? Thanks in advance.