Jokendo-collab commented 3 years ago

I have 72 mzML files and I have done the following:

#Reading the spectral data files <- list.files(".",pattern = '*.mzML', full.names = T) sps <- Spectra(files, backend = MsBackendMzR()) sps

df = spectraData(sps, columns = c("msLevel", "precScanNum", "scanIndex")) #extract the variables of interest

write.table(df, "scanNumbers.txt",sep = '\t') #write the scan number dataframe

ddf = read.table("scanNumbers.txt",header = T,sep = '\t')

using subset function to extract the scannumbers associated with MS1 and MS2

ms2 <- subset(ddf, msLevel == 2)

write.table(ms2, "msmsScanumbers.txt",sep = '\t')

ms1 <- subset(ddf, msLevel == 1)

write.table(ms1, "msScanumbers.txt",sep = '\t')

ms1 = read.table("msScanumbers.txt", header=T, sep='\t')

create MSnbase object for filtering

msnexp <- readMSData(files) msnexp

filtering filterPrecursorScan

msms = filterPrecursorScan(object="msnexp", acquisitionNum = ms1$scannumber)

I now want to export the individual spectra in mzML file and following the tutorial, I have not been able to do that. Could you help in this regard?

jorainer commented 3 years ago

Hi Javan! Firstly I suggest you use either Spectra or MSnbase - mixing the two might be tricky as not all functions work the same.

Could you please describe briefly what exactly you want to do?

Jokendo-collab commented 3 years ago

Hi @jorainer,

I want to separate the identified and unidentified spectra. To give you a little of the background, we know that only ~20% of the MS/MS spectra get Identified when we run sequence search engines such as MaxQuant. So I basically want to extract unidentified spectra using the scan numbers from the evidence file (MaxQuant output) and scan numbers from the raw files. And from one raw spectra....I need to have spectra containing the scan numbers contained in the evidence file and the other spectra should contain scan numbers not in the evidence file but present in the original raw file. In another word I want to split a single raw file based on the above information. I hope this sounds good?

jorainer commented 3 years ago

I see. So that should be fairly simple with Spectra: assuming sps is a Spectra that you read from a (single!) mzML file and max_quant_ids is an integer with the scan numbers from the MaxQuant output (i.e. the scan numbers from the evidence file*).

## Optionally filter to MS2 spectra only - don't know if that's needed/required in your case
sps <- filterMsLevel(sps, 2L)
sps_ident <- sps[sps$scanIndex %in% max_quant_ids]
sps_noident <- sps[!sps$scanIndex %in% max_quant_ids]

sps_ident will be all the already identified MS2 spectra and sps_noident all the not identified MS2 spectra (you lost all MS1 spectra with the filterMsLevel step above).

You could then also export the spectra to a mzML file

export(sps_noident, file = "not-identified.mzML")

The tricky thing will be to understand what the scan numbers from MaxQuant actually are, if they are the index of the spectrum in the mzML file or a number extracted from the spectrum ID. To explain:

scanIndex is the index of the scan (spectrum) in the original mzML file. It will be a number from 1 to the total number of spectra.
acquisitionNum is an integer ID extracted (by the proteowizard code within mzR) from the spectrum ID in the mzML file. This can be the same number as scanIndex but does not have to. If the mzML file was e.g. filtered by spectra before the numbers will be different.

Best would be if you compare sps$scanIndex with sps$acquisitionNum, if they are the same there should be no problem. If they are different you should ensure that you pick the right one (based on what MaxQuant returns as a scan number).

Jokendo-collab commented 3 years ago

@jorainer this worked well for me. Thanks for detailed response

Jokendo-collab commented 3 years ago

@jorainer I would like to calculate pairwise similarity between spectra and visualize. I used the following code but it gives an error: fls = dir(".",pattern = "mzML$",full.names = TRUE) sps_all = Spectra(fls,backend = MsBackendMzR()) cormat <- compareSpectra(sps, ppm = 20, FUN = ndotproduct) hm <- pheatmap(cormat, cutree_rows = 3)

When I run compareSpectra' function I get the following error: Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘compareSpectra’ for signature ‘"Spectra", "missing"’`

Could you guide me on this? I would like to get a correlation plot like the one shown here

jorainer commented 3 years ago

I guess you get the error because the sps object/variable is not defined. You load all spectra with sps_all <- ... but then you call compareSpectra(sps, ... - so there is some code missing where you filter/reduce your data set from sps_all to sps (a possibility could be to focus on only MS2 spectra using sps <- filterMsLevel(sps_all, 2), or, even better, to filter based on the precursor m/z you're interested in).

Maybe also have a look at the Spectra tutorials for more/other examples.

rformassspectrometry / Spectra

Exporting individual MS spectra from msnexp object #196

using subset function to extract the scannumbers associated with MS1 and MS2

create MSnbase object for filtering

filtering filterPrecursorScan