sgibb / MALDIquant

Quantitative Analysis of Mass Spectrometry Data
https://strimmerlab.github.io/software/maldiquant/
60 stars 25 forks source link

Extracting intensities from a file from a predifined m/z list #68

Closed mgorkaq closed 3 years ago

mgorkaq commented 3 years ago

Hi everyone,

I recently had an issue using the MALDIquant package.

Indeed, in the context of my research I have multiple MALDI MSI analysis.

In order to process my data for statistical analysis I would like to produce a global dataframe where all my columns are my m/z values.

I already managed to produce a global m/z list based on all the files of my folder as follows :

setwd( "/Users/mariegorka/Desktop/StandardsCopie" )
pcks <- c( "MALDIquant", "MALDIquantForeign", "dplyr" )
lapply( pcks, require, character.only = TRUE )
folderName <-"/Users/mariegorka/Desktop/StandardsCopie"
nameFiles <- list.files( folderName, pattern = "\\.imzML" )
n <- length( nameFiles )
dataTemp.Merge <- list()
for (i in 1:n){
  dataTemp <- importImzMl( paste0( folderName,"/",nameFiles[i] ), verbose = TRUE, centroided = TRUE )
  #dataTemp.Peaks <- binPeaks( dataTemp, method = "strict", tolerance = 0.05 )
  dataTemp.Merge <- c( dataTemp.Merge, dataTemp  )
  }

dataTemp.Peaks <- binPeaks( dataTemp.Merge, method = "strict", tolerance = 0.05 )
dataTemp.Peaks2 <- filterPeaks( dataTemp.Peaks, minFrequency = 0.05)  

But in a second step I would like to extract the intensities corresponding to each m/z on my list for each imzml file I have in my folder using intensityMatrix(). The main issue is that when I try to do it I have an error saying that "length of peaks is different from length of spectra". I have no problem for extracting a dataframe based on the m/z values on on file, however the m/z between files are sometimes really close and combining dataframes takes too long for the size of my data. The only solution I see is to do the peak picking part based on a preprocessed list of m/z values already calculated on all the files of my folder.

Thanks a lot in advance for your help.

Best regards.

Marie

sgibb commented 3 years ago

Dear Marie,

to be honest I don't understand what you are trying to do. As far as I understand you want to have an intensity matrix of all your files. That should be doable by your current (slightly) simplified code:

pcks <- c( "MALDIquant", "MALDIquantForeign", "dplyr" )
lapply( pcks, require, character.only = TRUE )
dataTemp <- importImzMl("/Users/mariegorka/Desktop/StandardsCopie", verbose = TRUE, centroided = TRUE )
dataTemp.Peaks <- binPeaks(dataTemp, method = "strict", tolerance = 0.05 )
dataTemp.Peaks2 <- filterPeaks(dataTemp.Peaks, minFrequency = 0.05)
im <- intensityMatrix(dataTemp.Peaks2)

The error length of peaks is different from length of spectra should just happen if you use intensityMatrix(peaks, spectra) and spectra isn't of the same length as peaks. Because you have just centroided data the spectra argument shouldn't be given at all.

mgorkaq commented 3 years ago

Dear Sebastian,

Thanks for your reply, so far I already managed to have an intensity matrix for each of my files through a loop.

However in order to perform statistical analysis on my datas, I need to have a combined dataframe regrouping all my intensity matrix. But, the main issue is that the columns of each intensity matrix are slightly different in each file. To overcome this problem, I wanted to extract the intensities for each file based on a unique reference list. This way all the intensity matrix generated could have the same column names. But I wasn't able to do it yet.

sgibb commented 3 years ago

I assume the code I mention above should work. Otherwise you could try to use match.closest do match the nearly identical mz values but I don't think this is needed.

sgibb commented 3 years ago

@mgorkaq maybe you are interested in the feature suggested in #69 . I am closing this issue. Feel free to reopen if you still have questions.