michalsta / opentims

Open-source C++ and Python module for opening binary timsTOF data files.
Other
44 stars 13 forks source link

Frame decompression error #14

Open ackagel opened 2 years ago

ackagel commented 2 years ago

I keep encountering this error:

Error uncompressing frame, error code: 18446744073709551606. File is either corrupted, or in a (yet) unsupported variant of the format.

when querying frames beyond some frame id, in some tdf files (e.g D.query(frames=[bad_frame], columns=('intensity', ...))). The error code reads like a u64 underflow (-10 or -8 I think). Here's some of the tdf metadata for context:

In [1]: tdf_tables.table2dict(os.path.join(path, 'analysis.tdf'), 'GlobalMetadata')

Out [1]: {'Key': array(['SchemaType', 'SchemaVersionMajor', 'SchemaVersionMinor',
        'AcquisitionSoftwareVendor', 'InstrumentVendor', 'ClosedProperly',
        'TimsCompressionType', 'MaxNumPeaksPerScan', 'AnalysisId',
        'DigitizerNumSamples', 'MzAcqRangeLower', 'MzAcqRangeUpper',
        'AcquisitionSoftware', 'AcquisitionSoftwareVersion',
        'AcquisitionFirmwareVersion', 'AcquisitionDateTime',
        'InstrumentName', 'InstrumentFamily', 'InstrumentRevision',
        'InstrumentSourceType', 'InstrumentSerialNumber', 'OperatorName',
        'Description', 'SampleName', 'MethodName', 'DenoisingEnabled',
        'PeakWidthEstimateValue', 'PeakWidthEstimateType',
        'PeakListIndexScaleFactor', 'OneOverK0AcqRangeLower',
        'OneOverK0AcqRangeUpper', 'MaldiApplicationType', 'RunId',
        'TargetId', 'Geometry', 'ImagingAreaMinXIndexPos',
        'ImagingAreaMaxXIndexPos', 'ImagingAreaMinYIndexPos',
        'ImagingAreaMaxYIndexPos'], dtype='<U26'),
 'Value': array(['TDF', '3', '6', 'Bruker', 'Bruker', '1', '2', '1014',
        '00000000-0000-0000-0000-000000000000', '447844', '800.000000',
        '4000.000000', 'timsTOF', '3.0.20',
        'I4IP-12.67.1.179; IPPT-12.67.1.179; IPET-12.67.1.179; FXM3-0.0.1.6; MXMC-0.0.4.2; MXIF-0.0.2.0; MXI2-NOT_PRESENT; MXRF-0.0.1.1; RFXS-0.1.3.1; RFXE-NOT_PRESENT',
        '2022-02-15T14:47:39.295-08:00', 'timsTOF fleX MALDI 2', '9', '2',
        '1', '1877407.00348', 'Admin', '', 'Glycans First',
        'timsTOFflex_startup TIMS ON MALDI.m', '0', '0.000025', '1', '1',
        '0.800000', '2.950000', 'Imaging', '', 'T_0235380_1002526_1',
        'Imaging_Run', '702', '1222', '103', '360'], dtype='<U158')}
michalsta commented 2 years ago

Hi,

The error code is not overflown, libZSTD actually reports them as high unsigned ints. Having said that: at a first glance there doesn't seem to be anything obviously wrong with the metadata, so, it's hard to tell what's happening without getting a look at the file itself. Can you upload it somewhere, or do you need to keep it confidential?

MatteoLacki commented 2 years ago

Hello,

And what is the value of bad frame and what are the minimal and maximal frames in the dataset? Just want to eliminate the usual suspects first :)

Best wishes

ackagel commented 2 years ago

@michalsta This particular run should probably stay confidential, but it sounds like our team can put together a run which we can share by sometime next week. For context, this same error code pops up on pretty much all our tdf's thus far, so it's likely the issue will be replicated.

@MatteoLacki minimum frame, in terms of id, is 1, while the maximum ranges between 70k-110k. A bad frame typically pops up around 3,000-8,000, and then all the subsequent frames throw this error. The intensities/ion-mobilities that come from the prior 'error-free' frames look correct and don't seem to be corrupted.

ackagel commented 2 years ago

@michalsta sorry for the delay, I can share some offending tdf/tdf_bin files now. Is sharing via GoogleDrive alright?

michalsta commented 2 years ago

Yes, absolutely. Just post/send me the link

ackagel commented 2 years ago

Great! https://drive.google.com/drive/folders/1z3McVakioNDHRzeTQCG9sFpigwKK5zf1?usp=sharing

michalsta commented 2 years ago

ok, that's weird. I downloaded these and... works for me ;) So. Let's take a few shots in the dark:

Maybe there's some zstd version mismatch and somehow it uses not the built-in but system-wide zstd? What do you get when you run: python -c "import zstd; print(zstd.version())" 1.5.1.0 for me.

What's your system (not Python) zstd version? like:

$ ls /usr/lib/libzstd.so -l
lrwxrwxrwx 1 root root 16 Feb 19 19:04 /usr/lib/libzstd.so -> libzstd.so.1.5.2

Maybe something got changed in transit? What's the md5sum of the two files you sent? For me it's

2c41f1053df0315db5c331d3979f37d5  analysis.tdf
3af802550b9c7f3f2643899257a68f04  analysis.tdf_bin

Could you check out newest opentims, install the Python version from devel branch - it'll install a script "opentims_verify.py", could you run it on your machine, both with and without -c option?

michalsta commented 2 years ago

Also, can you check which frame number is the crashy one?

ackagel commented 2 years ago

good to hear it works on your end! I'll try extracting on a different system for now; probably a linux machine since this is sounding more and more like a "windows is cranky about something ambiguous" problem.

As for the zstd version, looks like no zstd module is installed for python; but pip installing zstd didn't improve anything. Pretty sure I don't have a system zstd installed (if I do, not sure where that DLL would be on Windows). md5sums match, and opentims_verify.py also crashes on the problem frame. Looks like the first crashy frame is 8622.

michalsta commented 2 years ago

Ah, it's on Windows. Okay, now I can reproduce it. Will have a look.

simondoer commented 1 year ago

Hello, is there any progress on this problem? I am getting the same error for my files from DIA experiments. From a certain frame on, so far always at around 42000 of 67700, all subsequent frames cause the error. I also run opentimspy in Windows.