tbeu / matio

MATLAB MAT File I/O Library
https://matio.sourceforge.io
BSD 2-Clause "Simplified" License
330 stars 97 forks source link

Graceful handling of corrupt files #119

Closed cdbrkfxrpt closed 5 years ago

cdbrkfxrpt commented 5 years ago

In the tree I'm scanning, there is a damaged .mat v5 file; that is, Matlab itself is not able to open it, saying that File might be corrupt. scipy.io.loadmat says that it can't load the file because it could not read bytes.

matio, however, segfaults on trying to iterate over the file. What's worse: the program that produced this file is using matio as well.

I would love to share the file. The problem is that I can't for legal reasons. I also can't really reproduce the behaviour with anything that I could share; I only know the segfault happens on Mat_VarReadNextInfo.

Maybe this method can be made safe?

tbeu commented 5 years ago

I am afraid I need your special v5 MAT file to reproduce and fix the segfault.

tbeu commented 5 years ago

Which version of matio do you use actually? In recent versions I tried to fix a lot of issues regarding read of corrupted files.

cdbrkfxrpt commented 5 years ago

I can't it post it here, but I'm emailing you a link. I'm using 1.5.14

tbeu commented 5 years ago

Got it, thanks. I can see that the MAT file contains 22 variables and was created by matio v1.5.2 (from 31 July 2013). I assume that the corrupted MAT file is due to that old version since it contains after the 22nd variable (starting at offset 0x001bc7d0) another 48 bytes of an (incomplete) 23rd variable:

segfault

When iterating over all variables in matio v1.5.14 (and newer) the EOF read failure is detected since a0539135c9b1ab7613aa7953279da9224da88775 and matio calls Mat_Critical with "An error occurred in reading the MAT file".

In your writing application you should

In your reading application you can

From my side there is nothing else to do here.

tbeu commented 5 years ago

Found a small issue to be improved.

tbeu commented 5 years ago

Handling of read errors should be improved by ae82712960ed5c0b44edbafce23c9f2f29f097d8 and v1.5.17 which I am going to release soon.

cdbrkfxrpt commented 5 years ago

In your writing application you should

* update to a newer version of matio such that no corrupted file is written.

I have no control over the writing application, it's not ours.

In your reading application you can

* avoid iterating over variables in the file using Mat_VarReadNextInfo, but prefer directly reading a variable (info) by Mat_VarRead/Mat_VarReadInfo if you know the variable name.

Can't do that either because the variable names change from file to file as they are being prefixed or just different.

Thank you anyhow, I'm already happy if this is receiving attention and getting improved, meaning there will be a fix for this (from my perspective) edge case in the foreseeable future.

tbeu commented 5 years ago

Try v1.5.17. You will not get any errors (or messages) when iterating over the variables.

cdbrkfxrpt commented 5 years ago

Okay perfect, thanks!