sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
178 stars 81 forks source link

Make 2D separation possible #86

Open stanstrup opened 7 years ago

stanstrup commented 7 years ago

Just putting as a reminder here since a lot of re-modelling seems to be going on.

While you consider how to re-organize XCMS please consider making it possible to use LCxLC and GCxGC data. AFAIK we don't have a peak picker but at least the design should allow it at some point.

jorainer commented 7 years ago

Good point! Could you briefly describe how such data would look like?

kamakamadaun commented 7 years ago

Don't know. The raw data (I should be able to get you a GCxGC sample) or in xcms?

What about making xcmsRaw@scantime a matrix? This way you could have any number of dimensions (eg. LC x LC x IMS).

xcmsSet@peaks would then need to auto "expand" to rt1, rt2 and so on. xsaFA@xcmsSet@rt matrices too..

jorainer commented 7 years ago

I meant the raw data - is it one mzML file per sample with two retention times per spectrum?

I am aiming to use the MSnExp or even bettern the OnDiskMSnExp objects from the MSnbase package as the container for the raw data (instead of the xcmsRaw). There the data is organized in spectra, one spectrum per scan time (retention time), and each spectrum with the corresponding mz-intensity pairs.

stanstrup commented 7 years ago

I looked into it. Apparently you can only get a netCDF from LECO's GCxGC peg files. Do you want such a file (~1GB) or is there a way to get the relevant info from the netCDF?

I also have an LC-QTOF-IMS file converted to mzML from waters format (~1GB) if you want it? I left the normal MS1, an MSe (I think) and the lockmass scans in there. A spectrum looks like this:

        <spectrum index="4" id="function=3 process=0 scan=5" defaultArrayLength="50">
          <cvParam cvRef="MS" accession="MS:1000579" name="MS1 spectrum" value=""/>
          <cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="1"/>
          <cvParam cvRef="MS" accession="MS:1000130" name="positive scan" value=""/>
          <cvParam cvRef="MS" accession="MS:1000128" name="profile spectrum" value=""/>
          <cvParam cvRef="MS" accession="MS:1000796" name="spectrum title" value="5410_OLD_037.5.5. File:&quot;5410_OLD_037.raw&quot;, NativeID:&quot;function=3 process=0 scan=5&quot;"/>
          <scanList count="1">
            <cvParam cvRef="MS" accession="MS:1000795" name="no combination" value=""/>
            <scan>
              <cvParam cvRef="MS" accession="MS:1000616" name="preset scan configuration" value="3"/>
              <cvParam cvRef="MS" accession="MS:1000016" name="scan start time" value="0.0270000007" unitCvRef="UO" unitAccession="UO:0000031" unitName="minute"/>
              <cvParam cvRef="MS" accession="MS:1002476" name="ion mobility drift time" value="0.423998304007" unitCvRef="UO" unitAccession="UO:0000028" unitName="millisecond"/>
              <scanWindowList count="1">
                <scanWindow>
                  <cvParam cvRef="MS" accession="MS:1000501" name="scan window lower limit" value="50.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                  <cvParam cvRef="MS" accession="MS:1000500" name="scan window upper limit" value="1200.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                </scanWindow>
              </scanWindowList>
            </scan>
          </scanList>
          <binaryDataArrayList count="2">
            <binaryDataArray encodedLength="308">
              <cvParam cvRef="MS" accession="MS:1000523" name="64-bit float" value=""/>
              <cvParam cvRef="MS" accession="MS:1000574" name="zlib compression" value=""/>
              <cvParam cvRef="MS" accession="MS:1000514" name="m/z array" value="" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
              <binary>eJxjYGBQiHOJdWBgYHjQAqU3HokH0Q63ILSC8O8EEM2gAaEPLFBMBNELtkLpG4opYPXvIfSDtU5ZYPo4lOZXzgHr04LQDFfXgGmFDxDa4aBSLohuuAOhGQ4sg/BvQ+gD1oeLwOq8IfSBAqZiEJ3QAKEPbDAB0w0HIDSDWHUJWL0ahF5wlrkMzOe5WgnWJwehHd6YVIHl/0Fohtxv9WDxRgidoJnUCBY3gdANdhBawRNCLzDJgvDtITRDgm8bWN20++1gfy+C0Am637rB7rSC0ApZ9j1g/eUQmqFQeiJYXQWQBgAUw2ai</binary>
            </binaryDataArray>
            <binaryDataArray encodedLength="156">
              <cvParam cvRef="MS" accession="MS:1000523" name="64-bit float" value=""/>
              <cvParam cvRef="MS" accession="MS:1000574" name="zlib compression" value=""/>
              <cvParam cvRef="MS" accession="MS:1000515" name="intensity array" value="" unitCvRef="MS" unitAccession="MS:1000131" unitName="number of detector counts"/>
              <binary>eJxjYAABLQcwxfDBHkJbQPkaUNoSTZ4ByofpE4HSAg6o8kpQWg1NXgdKy+BQpwKlOdDUw/gw8yXQzIXxYeYYQmkTNH0weQU0GiYPNa/BE8p3hPKd0dwBczcsHGD2o7sPZr8ZlIb5BxZusHA1dAAACOAWFA==</binary>
            </binaryDataArray>
          </binaryDataArrayList>
        </spectrum>
jorainer commented 7 years ago

OK, so this will be something for mzR I believe - the representation of the data in R shouldn't be too difficult. xcmsRaw would have to be tweaked quite considerably, but the OnDiskMSnExp and the potential new objects should be OK. They are organized similar to the mzML files by spectrum, so there should be no problem if one spectrum has multiple rts or similar.

jmorim commented 4 years ago

I think this could be done at the XCMS level. Is anyone working on this? The rest of my PhD thesis will be on LCxLC work using MS1 and MS2 data. I'm trying to cobble some functions together but it's slow trying to catch up with how XCMS and CAMERA work. Steffen, if this functionality were added, do you think it could produce a publication?

stanstrup commented 4 years ago

I am not aware of anyone working on this. I am not @sneumann but I am quite sure this would easily be a publication if someone did this. In fact I have been part of applications where doing this was part of a PhD project. Those applications were not successful though.

michaelwitting commented 4 years ago

I'm also not aware of people working on it. Would be a great addition. I also tried to read LC-IMS-MS data, which is some kind of 2D separation. MSnbase reads the data perfectly and I can access it, so I think it should be also fine for LCxLC or GCxGC.

jorainer commented 4 years ago

Sounds like a good idea - and it shouldn't be to difficult. Let me know if you need some help or something is unclear @jmorim

jmorim commented 4 years ago

I've started an addon for xcms and CAMERA to handle comprehensive 2D data. https://github.com/jmorim/twoDxc The goal for this package is to group CAMERA data by the 2D rt dimension along with 1D rt and MS data. Here's an example 2D EIC generated with the plot2D function from the package: 2dplot_181

sneumann commented 4 years ago

Hi, thanks for sharing! Could this also be of interest for Ion Mobility data ? @michaelwitting might be interested to try ... Yours, Steffen

jorainer commented 4 years ago

Regarding addon for CAMERA - if possible I would like to keep this 2D grouping independent of the classes in CAMERA. I would like to have a more general, modular way of feature grouping (we started implementing some feature grouping functions in our small utility package, look for group* functions - would discuss this anyway some time with @sneumann ).

The goal in the long run would be to end-up using (or including functionality to) the Features package which at present supports proteomics feature grouping.

jmorim commented 4 years ago

I agree, above I originally wanted this to only depend on XCMS but I'm trying to graduate within the next six months and implementing 2D grouping based on already grouped pseudospectra from CAMERA has been easier for me right now. As a chemist, I'd also like to stray from "features," assuming you mean rt/mz pairs, since to identify a compound, I need a spectrum of at least a few ions.

jorainer commented 4 years ago

Totally understand that. I suggest you create a grouping function that takes standard R data types as input (such as numeric, etc) instead of a specific S4 class. That way you keep your code base flexible enough to be reused by other packages or be integrated into a revamped CAMERA package (or xcms?).