> Hi Igatto, did you get a chance to look into the problems I describe in our previous discussion? Thanks!
Hi @lgatto - you comments make very good sense to me. I found the problem I had was that when using the mzR to open the identification file (.pep.xml), for whatever reason, it resulted into a data frame in which a peptide with N modifications (e.g. 57.02147 for cysteine and there are N cysteines in the peptide) will be duplicated with N rows; each row with only a single modification with a specific "modMass" and "modLocation", but in reality all these modifications should all be there for this peptide. Same thing with MSnbase::readMzIdData.
Yes, this makes sense, and is expected. Several options from here on:
you are only interested in looking at identification data, and you can focus on the data.frame with the multiple rows per peptides
if you are interested in quantitation data, I don't think you will distinguish the different forms, and will thus have a single set of quantitation values, so prick either id results (noting that you have more)
same for the raw data, pick one to simply get around that error
or, see PSMatch and how to reduce such and identification result object.
Hi @lgatto - you comments make very good sense to me. I found the problem I had was that when using the mzR to open the identification file (.pep.xml), for whatever reason, it resulted into a data frame in which a peptide with N modifications (e.g. 57.02147 for cysteine and there are N cysteines in the peptide) will be duplicated with N rows; each row with only a single modification with a specific "modMass" and "modLocation", but in reality all these modifications should all be there for this peptide. Same thing with MSnbase::readMzIdData.
Originally posted by @ibphuangchen in https://github.com/lgatto/MSnbase/issues/584#issuecomment-1319310681