Intrinsic Peak Analysis (IPA) by the Integrated Data Science Laboratory for Metabolomics and Exposomics (IDSL.ME) is a light-weight R package that extracts peaks for organic small molecules from untargeted liquid chromatography high resolution mass spectrometry (LC/HRMS) data in population scale projects. IDSL.IPA is a suite of new algorithms covering extracted ion chromatogram (EIC) candidate generation, peak detection, peak property evaluation, recursive mass correction, retention time correction across multiple batches and peak annotation. IDSL.IPA generates comprehensive and high-quality datasets from untargeted analysis of organic small molecules for population-size studies. We have shown in our publication that IDSL.IPA is able to outperform similar peak picking tools such as MZmine 2, xcms, and MS-DIAL in terms of sensitivity, specificity and speed.
1) Parameter selection through a user-friendly and well-described parameter spreadsheet 2) Analyzing population size untargeted studies (n > 500) 3) Calculating 19 chromatographic peak properties such as peak area, nIsoPair, RCS, cumulated intensity, R13C, peak width, RPW, number of separation trays, asymmetry factor, USP tailing factor, skewness using derivative method, symmetry using pseudo-moments, skewness using pseudo-moments, gaussianity, S/N using baseline, S/N using the xcms method, S/N using the RMS method, and sharpness. 4) Mass spectra scan level Ion Pairing to only select relevant ions 5) Flexibility to screen for any ion mass difference in addition to natural carbon signatures (12C/13C isotopologues) mass difference 6) Retention time correction using endogenous reference markers for multi-batch large scale studies 7) Generating batch untargeted extracted ion chromatograms (EICs) 8) Generating pairwise correlations list for aligned peak height and its gap-filled tables to detect potential recurring adducts, in-source products and fragment peaks 9) Aggregating untargeted EICs after (m/z-RT) annotation for each compound 10) Parallel processing in Windows and Linux environments 11) Integration with molecular formula annotation tools IDSL.UFA and IDSL.UFAx 12) Integration with IDSL.CSA workflow to cluster recurring ions to generate composite spectra
install.packages("IDSL.IPA")
Note: In case you want to process netCDF/CDF mass spectrometry data by IDSL.IPA, you should also install the RnetCDF package separately using the below command.
install.packages("RNetCDF")
To process your mass spectrometry data (mzXML, mzML, netCDF), download the IPA parameter spreadsheet and select the parameters accordingly and then use this spreadsheet as the input for the IPA_workflow
function as shown below:
library(IDSL.IPA)
IPA_workflow("Address of the IPA parameter spreadsheet")
Follow these steps for a quick case study (n = 33) ST002263 which has Thermo Q Exactive HF hybrid Orbitrap data collected in the HILIC-ESI-POS/NEG modes.
Download the file "ST002263_Rawdata.zip (1.6G)"
Separate the positive and negative modes .mzXML data into different folders and process each mode separately. Positive and negative modes data must be processed separately.
Open the IPA parameter spreadsheet and use default values for all parameters, except:
3.1. Input data location: In PARAM0007, specify the location of the MS1 level HRMS data.
3.2. Output location: In PARAM0010, specify the location where you want the processed data to be stored.
3.3. Number of processing threads: In PARAM0006, increase the number of processing threads based on your computer's computational power.
Open the R/Rstudio console or terminal and run the following command:
library(IDSL.IPA)
IPA_workflow("Address of the IPA parameter spreadsheet")
The results will be available in the output location specified in PARAM0010 and will include:
5.1. Individual peaklists for each HRMS file in .Rdata and .csv formats in the "peaklists" directory.
5.2. Peak alignment tables in the "peak_alignment" directory.
5.3. (Optional) untargeted EICs in the "IPA_EIC" directory for each HRMS file, if selected YES in PARAM0009.
IPA_targeted
function for a large number of peaks (m/z-RT pairs) with an example for targeted IDSL.IPAWe post major changes in the IDSL.IPA workflow here.
[1] Fakouri Baygi, S., Kumar, Y. Barupal, D.K. IDSL. IPA characterizes the organic chemical space in untargeted LC/HRMS datasets. Journal of proteome research, 2022, 21(6), 1485-1494.