holgern / pyedflib

pyedflib is a python library to read/write EDF+/BDF+ files based on EDFlib.
http://pyedflib.readthedocs.org/
BSD 3-Clause "New" or "Revised" License
209 stars 121 forks source link

Bug in start-time convertion #236

Closed burnpanck closed 10 months ago

burnpanck commented 10 months ago

While trying to understand timezone information, I stumbled upon the following subtle bug: https://github.com/holgern/pyedflib/blob/1af9ced84921b304fecdc994e7e8b9a4c1674245/pyedflib/edfreader.py#L371-L372

Obviously, the conversion factor between nanoseconds and microseconds is 1000, not 100.

skjerns commented 10 months ago

Did you confirm this actually leads to a bug?

I remember something that edflib stores the time in units of 10ns, so the conversion is x100. I think this should be somewhere noted in the source code. Sorry I'm on mobile until next week, can't check it well

skjerns commented 10 months ago

See here

https://github.com/holgern/pyedflib/blob/1af9ced84921b304fecdc994e7e8b9a4c1674245/pyedflib/_extensions/_pyedflib.pyx#L634-L650

burnpanck commented 10 months ago

Did you confirm this actually leads to a bug?

No, I didn't. I kind-of did an "in-promptu code-review" here 😃. From a code-review perspective, those two lines are a clear issue in any case: either the code is wrong, or the comment is wrong.

The code that you have pointed out is not directly related to the start timestamp, but it does hint at a third unit which is neither nanoseconds nor microseconds. If self.starttime_subsecond were in units of 100 ns though as hinted at in the comment you linked, then the correct division factor would in fact be /10 and not /100 to get to microseconds, so the code would again be wrong.

In general, my recommendation would be to try to avoid spreading magic numbers like those 100 around in the code, and instead:

If that is not feasible, then every magic number should be loudly and clearly documented (and the comment updated whenever the code is modified!). If there is a mismatch, it should be considered a bug.

That said, the whole EDF format family is too limited in it's capabilities for my taste: e.g. it seems to lack time-zone information. The actual EDF spec is very clear that starttime should be represented in the patient's local time, unfortunately, our design rules disallow naive timestamps entirely. We do need to ingest EDF files from the periphery sometimes, but we're trying to avoid it as much as we can.

Thus, feel free to proceed with this issue report as you see fit. The mismatch between the comment and the code caught my eyes, and I though you might want to know. If you don't, feel free to close.

skjerns commented 10 months ago

thanks! I fully agree with you :) I'll try to improve it in another cycle