Open veitveit opened 5 months ago
So this seems to be an issue related to fragannot, the original result returned by fragannot contains several entries for every spectrum (that sometimes are the same and sometimes different). Therefore we get these duplicates - which btw also happens for the fragment-centric dataframe.
This is the original result from fragannot result.json
I can add a filter that keeps track of everything that was already added to the dataframes to avoid duplicates, but maybe we should check if everything is right with fragannot (e.g. why we get these duplicate results)?
I think the source of duplication can be the fact that the identification file contains multiple identifications per spectrum.
For example the SpectrumIdentificationResult
for spectrum 1082 (in the screenshot) contains 28 SpectrumIdentificationItem
elements corresponding to different peptidoforms of the same primary sequence. These peptidoforms produce partially overlapping annotations.
Note that as far as I can see in the code, the idea was to only read the top ranked identification. in this case, however, some, but not all of them have the same score, and six identifications are marked as rank 1, so they are all processed.
At least happening for the example dataset.