lgatto / MSnbase

Base Classes and Functions for Mass Spectrometry and Proteomics
http://lgatto.github.io/MSnbase/
123 stars 50 forks source link

extended mzid info not importing #542

Closed BaylorSci closed 3 years ago

BaylorSci commented 3 years ago

I am wondering whether you can help. I have an extended mzid file (generated via MSGF+) which has had percolator information added to it via pout2mzid. However, when i use the following typical workflow, the columns i am interested in are not imported in.

emb=readMSData(file.path(quant.file[10]), mode="onDisk", msLevel. = 2) ) emb=addIdentificationData(Embryo_5_a, id = id.file[10])

I have confirmed that they are present in the raw mzid files, but when i check fvarlabels, these columns are missing (the columns being percolator:peptide_pep and percolator::peptide_q_value) To be sure that this information is present and importable, i separately read the data in via mzID and found the columns i am interested in. I explored using the flatten function to then add this via addIdentificationData, however that is taking a long time, and typically returning the following error.

Error in as.character.default(id[, desc]) : R character strings are limited to 2^31-1 bytes In addition: Warning messages: 1: In split.default(x, g) : data length is not a multiple of split variable 2: In split.default(seq_along(x), f, drop = drop, ...) : data length is not a multiple of split variable

I am unsure how to proceed, so any help or thoughts would be very much appreciated.

lgatto commented 3 years ago

I would suggest to have a look at the new packages under the R for Mass Spectrometry initiative. Then

To then join both, see here. I am happy to help out/debug using these packages.

BaylorSci commented 3 years ago

Many thanks. I am struggling with the PSM package as during the compilation, the error message "namespace 'ProtGenerics' 1.22.0 is being loaded, but >= 1.23.1 is required" appears. However, i am running the latest R version (4.0.5), and bioconductor does not update to 3.13 until next month of which 1.23.1 is part of. Is their a way around this as i am really interested in using this combination

lgatto commented 3 years ago

Could you install from GitHub with BiocManager::install("lgatto/ProtGenerics")

BaylorSci commented 3 years ago

Yes, this worked. I just loaded my mzid file using the PSMs package and the names are still missing in this. I can forward the mzid file if this will help? I am specifically looking to ensure that when i read in the mzid file, the new columns named percolator:peptide_pep and percolator:peptide_q_value are found as i intend to filter by these later on. This is true only if i look at the file using MSnID,

lgatto commented 3 years ago

Yes, please send one over.

lgatto commented 3 years ago

I'm closing the issue here. Could you open one here to follow up with the missing variables.