sneumann / mzR

This is the git repository matching the Bioconductor package mzR: parser for netCDF, mzXML, mzData and mzML files (mass spectrometry data)
40 stars 26 forks source link

Support for TIMS-TOF Ion Mobility Data #275

Open RogerGinBer opened 1 year ago

RogerGinBer commented 1 year ago

Hi there,

I have some LC-IM-MS data acquired with a Bruker TIMS-TOF instrument and I was wondering how we could import it with mzR. I looked into a previous issue #44 and its corresponding fix #176 but the CV term added back then was "ion mobility drift time" (1002476) and not "inverse reduced ion mobility" (1002815) , which is the one I have.

Using MSConvert, a 82MB TDF file exploded into a 5.4 GB mzML since it split each RT-IM combination into a separate scan (which creates a huge overhead), but it could work if the 1002815 CV term was added. Could there be a way to programatically specify which CV terms to look for in the scan headers?

Alternatively, I tried using TIMSCONVERT, to convert from TDF to mzML, which resulted in a much manageable 293MB file. In this case, the ion mobility data is encoded as a binaryDataArray just like the mz values and the intensities (see their figure). Could a function similar to those used to read mzs and intensities be used for reading this information?

I'd be great to reach a mz-RT-IM-Intensity table format, as that would be a first step towards extending current peak-picking and annotation software to ion mobility data.

Thanks a lot! Roger

sneumann commented 1 year ago

Hi, thanks for the interest. Indeed, the specification of that third binaryDataArray is underway at the PSI-MS mzML team. First thing we'd need is 1) an example mzML file (the smaller the better), a mockup how we'd like the result to look like in mzR and 3) know which code in proteowizard we need to call in this place: https://github.com/sneumann/mzR/blob/48029f236c90e66992d8835e174b22d235c34c2d/src/RcppPwiz.cpp#L414 This also needs checking with the downstream packages to make sure no regressions are introduced there. Yours, Steffen

RogerGinBer commented 1 year ago

Hi Steffen, I've prepared two small example files with the header format from Proteowizard (65 scans, corresponding to only two frames) and the binary format from TIMSConvert (5 full MS1 scans): TIMS_examples.zip.

Regarding how the result should look like in mzR, perhaps the most natural way would be to expand the matrices generated in RcppPwiz::getPeaklist with an additional IM column only if the ion mobility binary array is found on the data.

Looking into the Proteowizard C++ library, I've found the definitions of the Spectrum object and getMZArray methods: https://github.com/sneumann/mzR/blob/48029f236c90e66992d8835e174b22d235c34c2d/src/pwiz/data/msdata/MSData.hpp#L570 https://github.com/sneumann/mzR/blob/48029f236c90e66992d8835e174b22d235c34c2d/src/pwiz/data/msdata/MSData.cpp#L741 Perhaps we could use something like the method Spectrum::getArrayByCVID (defined just before getMZArray) with the corresponding CV value (1002815, or whichever the user has)? From what I understand, Proteowizard saves in each Spectrum the pointers of all BinaryDataArrays, so we could extract the array we want

Cheers, Roger

jorainer commented 1 year ago

Hi Roger!

An alternative (if you don't necessarily need to stick to mzR) would be to use Spectra with the MsBackendTimsTof. Would also be nice if you could maybe have a look into that. Generally, we should be more flexible with Spectra, as we allow now to have additional columns to the peaksData matrix (in addition to "mz" and "intensity". You are welcome to also provide feedback in that repo or even better pull requests.

RogerGinBer commented 1 year ago

Hi Johannes,

That's a very convenient package that flew under my radar, thanks a lot! :+1: I just tried it and it extracts all the info I needed: now I'm thinking about extending ROI detection and centWave peakpicking to IM data, but that would be something for xcms. Have any of you worked on that previously?

Still, I think mzR would benefit from being able to read binary IM data from the standard mzML file, since it'd be the most vendor-agnostic way. I'll give it a try and see what we can do

jorainer commented 1 year ago

Updating/adapting xcms to use Spectra instead of MSnbase is on my TODO list. With that we would be able to run peak detection using any backend - in addition to mzR.