michalsta / opentims

Open-source C++ and Python module for opening binary timsTOF data files.
Other
41 stars 11 forks source link

Possible bug on data from PASEF Lipidomics paper #10

Open liquidcarbon opened 2 years ago

liquidcarbon commented 2 years ago

Hi, I'm working with the data from this paper. Raw data is available here: https://massive.ucsd.edu/ProteoSAFe/dataset.jsp?accession=MSV000083858

Checked several samples from the data, and every frame in every sample looks like this - with nonsensical, repeating numbers and 2^32-1 for tof.

D.query(frames=[1], columns=all_columns)

{'frame': array([1, 1, 1, ..., 1, 1, 1], dtype=uint32),
 'scan': array([962, 962, 962, ..., 962, 962, 962], dtype=uint32),
 'tof': array([4294967295, 4294967295, 4294967295, ..., 4294967295, 4294967295,
        4294967295], dtype=uint32),
 'intensity': array([0, 0, 0, ..., 0, 0, 0], dtype=uint32),
 'mz': array([1.16998969e+11, 1.16998969e+11, 1.16998969e+11, ...,
        1.16998969e+11, 1.16998969e+11, 1.16998969e+11]),
 'inv_ion_mobility': array([0.60334394, 0.60334394, 0.60334394, ..., 0.60334394, 0.60334394,
        0.60334394]),
 'retention_time': array([1.00014505, 1.00014505, 1.00014505, ..., 1.00014505, 1.00014505,
        1.00014505])}

Adding global metadata:

{'SchemaType': 'TDF',
 'SchemaVersionMajor': '3',
 'SchemaVersionMinor': '0',
 'AcquisitionSoftwareVendor': 'Bruker',
 'InstrumentVendor': 'Bruker',
 'TimsCompressionType': '1',
 'ClosedProperly': '1',
 'MaxNumPeaksPerScan': '374',
 'AnalysisId': '00000000-0000-0000-0000-000000000000',
 'DigitizerNumSamples': '410976',
 'PeakListIndexScaleFactor': '1',
 'MzAcqRangeLower': '50.000000',
 'MzAcqRangeUpper': '1550.000000',
 'OneOverK0AcqRangeLower': '0.417355',
 'OneOverK0AcqRangeUpper': '1.888302',
 'AcquisitionSoftware': 'Bruker otofControl',
 'AcquisitionSoftwareVersion': '5.1.81.740-13575-vc110',
 'AcquisitionFirmwareVersion': '<unknown>',
 'AcquisitionDateTime': '2019-02-15T12:59:09.028+01:00',
 'InstrumentName': 'timsTOF Pro',
 'InstrumentFamily': '9',
 'InstrumentRevision': '1',
 'InstrumentSourceType': '11',
 'InstrumentSerialNumber': '1838271.22',
 'OperatorName': 'Demo User',
 'Description': '',
 'SampleName': '201902014_TIMS1_LC6_CaVa_lipidom_plasmaNIST_MTBE_1in20_90min_TIMS-MSMS',
 'MethodName': 'metabolomics+lipidomics PASEF_stepping_10000_1500.m'}
michalsta commented 2 years ago

Hi,

Sorry for the belated response, but you know, vacation ;)

Anyway, what you have is (IIRC) not a TimsTofPro file format but just TimsTof, and we don't support that. You can tell by TimsCompressionType=1 instead of 2. The dataset has somewhat similar structure, but the format is slightly different and our parser fails to interpret that. It's definitely a bug that it's returning random stuff from uninitialized memory (as that's what those 2^32-1s are) but for now the fix is going to be to just properly throw an exception.

We are considering adding proper support for that format, but the best ETA I can give is "sometime in the future" ;)