holgern / pyedflib

pyedflib is a python library to read/write EDF+/BDF+ files based on EDFlib.
http://pyedflib.readthedocs.org/
BSD 3-Clause "New" or "Revised" License
209 stars 121 forks source link

EDF open fails when physical max and min are both 0 #256

Open adammj opened 3 weeks ago

adammj commented 3 weeks ago

Line 214 of edflib.c tests for the following condition edfhdr->edfparam[i].phys_max==edfhdr->edfparam[i].phys_min), which then throws an error.

I understand in a technical sense that this shouldn't be possible. However, this is the real world, and I have thousands of EDF files (recorded by various groups) that forgot to set the physical max+min for 1 or more channels (they are both set to 0). This leads to the condition that this EDF library will refuse to open those files, even though only 1 or 2 channels are "bad". However, some others, like the native one in MATLAB and some EDF viewers, will gladly open these same files.

I'd like to propose that instead the file still be opened (maybe in all cases, but certainly in the case that both physical min and max are 0). Looking over the code elsewhere, I don't see any instances where this would cause new issues. It's just that reading back the physical values for the "bad" channel would all be 0s. Whereas reading the digital values will still report whatever was actually stored.

Another option, which I think would require more work, is to modify the initial load to act like those channels don't exist.

If this is acceptable, I can certainly make the changes.

skjerns commented 2 weeks ago

That's a good idea, thanks!

Yes, there are quite a couple of things that the original C library is quite strict on, not only this, see e.g. https://github.com/holgern/pyedflib/pull/114 , however we are only a wrapper for edflib

However, the more additions we include and monkey-patch in edflib.c, the harder it get's to keep up with upstream changes. It was already a pain to update to the most recent version, which is already a few years out of date.

On the long run it would be great to keep the edflib.c as is from upstream and insert all additions as patches elsewhere, however, I haven't looked into what would be the best approach for that.

Do you have suggestions on how to solve this?

adammj commented 2 weeks ago

I noticed the upstream library after I submitted this, and have already heard back from the developer that they won't allow these sorts of compromises. I guess they haven't heard of Postel's law ("be conservative in what you do, be liberal in what you accept from others").

Regardless, now that I understand your desire to minimize work in keeping up with upstream changes, I'll mull it over and see if I can come up with a better solution than the one I currently implemented on my local copy.

I'll get back with you.

skjerns commented 2 weeks ago

Yes, teuniz is quite strict with implementation specifics, but I guess it has it's advantages.

If you find a way to implement changes without altering the C code that would be great! I'm not very proficient at C unfortunately and I haven't gotten around to look into options on how to monkey-patch or overload certain functions. That would be very useful for other things as well and for long-term maintainability.

adammj commented 2 weeks ago

Unfortunately, I don't see a way around altering the C code (which I have experience with). Looking at the teuniz's C and Python code closer, and testing it against my collection of 30k EDF files, I've found an additional half dozen+ cases distributed over 4k files which also fail. It's obvious that the code is way too strict in what it allows (and their "solution" of providing helper utilities to rewrite the EDF files is not acceptable where strict provenance is required—which should be all of science).

skjerns commented 2 weeks ago

Thanks!

As far as my research goes, using macros to redirect the function calls could work?

Eg


#define foo my_foo  // Redirect foo to my_foo

#include "edflib.h"  // Include the original header

#undef foo  // Remove the macro redirection to avoid pollution

// Your code that calls foo()
int main() {
    int result = foo(10);  // This will call my_foo(10)
    return 0;
}