lgatto / MSnbase

Base Classes and Functions for Mass Spectrometry and Proteomics
http://lgatto.github.io/MSnbase/
123 stars 50 forks source link

duplicated rows resulted in error #589

Closed ibphuangchen closed 1 year ago

ibphuangchen commented 1 year ago
          > Hi Igatto, did you get a chance to look into the problems I describe in our previous discussion? Thanks!

Hi @lgatto - you comments make very good sense to me. I found the problem I had was that when using the mzR to open the identification file (.pep.xml), for whatever reason, it resulted into a data frame in which a peptide with N modifications (e.g. 57.02147 for cysteine and there are N cysteines in the peptide) will be duplicated with N rows; each row with only a single modification with a specific "modMass" and "modLocation", but in reality all these modifications should all be there for this peptide. Same thing with MSnbase::readMzIdData.

Originally posted by @ibphuangchen in https://github.com/lgatto/MSnbase/issues/584#issuecomment-1319310681

lgatto commented 1 year ago

Yes, this makes sense, and is expected. Several options from here on: