Add highlevel.crop_edf function

holgern / pyedflib

pyedflib is a python library to read/write EDF+/BDF+ files based on EDFlib.

http://pyedflib.readthedocs.org/

BSD 3-Clause "New" or "Revised" License

209 stars 121 forks source link

Add highlevel.crop_edf function #196

Closed raphaelvallat closed 10 months ago

raphaelvallat commented 1 year ago

As discussed in https://github.com/holgern/pyedflib/issues/195, this adds a new function to crop the EDF file.

I'm still working on the unit tests. The following works like a charm:

from pyedflib import highlevel
from datetime import datetime
new_start = datetime(2011, 4, 4, 12, 58, 0)
new_stop = datetime(2011, 4, 4, 13, 0, 0)
highlevel.crop_edf("data/test_generator.edf", new_start=new_start, new_stop=new_stop, verbose=True)

However, when I set both new_start and new_stop to None, I should get True with compare_edf(original, cropped), but instead got the following error:

Do you have any idea why that might be? I'm not changing any of the signal headers 🤔

PS: I would recommend using black for code formatting. I usually set a max line length to 100 instead of the default 80 because I find it more readable. If that's something you'd be interested in let me know and I can submit a second PR after this one.

Thanks, Raphael

skjerns commented 1 year ago

Sorry for the late reply, much to do right now

This might be due to rounding errors that get introduced by converting forth and back from digital to analog signal space. EDF saves files in digital space (i.e. amplifier integers), and then converts to floats (i.e. Voltage) as output, depending on the digital range dmin/dmax, there can be loss of information.

Try loading and saving the data with digital=True, the signal should then stay the same

I'm flying to a conference to the US soon, so will only have time to look at it in May. If you want, you can also wait for my further input until then, but knowing that you're a good dev (we'll be citing YASA in our upcoming publication!😊), I assume your code will be good anyway :)

skjerns commented 1 year ago

Using black or any other formatting tool/guideline would be great. Until now, the code base has no enforcement of any contribution guidelines or style guides. However, @holgern is the person that has to decide that in the end.

raphaelvallat commented 1 year ago

Hi @skjerns!

Haha, that's kind of you but I would not trust myself too much as I've never dive into the pyedflib code before 😅 Anyway it's pretty straightforward so hopefully it's not too much work to review.

Try loading and saving the data with digital=True, the signal should then stay the same

Yep works perfectly, thanks!

I've added unit tests and everything is running smoothly on my machine ✅

skjerns commented 10 months ago

My apologies that this took so long.

The code looks good. thank you a lot for this addition!

One more request: It would be great if cropping could be done with relative times as well. My suggestion would be to have three parameters start_dt, start_sec, start_smp (which are mutually exclusive) and their corresponding stop_x. This way, cropping could be done without knowing the RecordingStartdatetime of the EDF. Think of a scenario where you want to remove e.g. the first 15 minutes of the recording or have a very specific exact number of samples after midnight (that means start_x and stop_x can be of different type, e.g. start_dt together with stop_smp). Internally we transform them to a common unit (as is done right now already) and apply the cropping.

What are your thoughts on this?

raphaelvallat commented 10 months ago

Thanks for the review @skjerns. Sounds good about adding the possibility to crop based on samples or seconds. For simplicity, what about:

highlevel.crop_edf(
"data/test_generator.edf", 
start=new_start, stop=new_stop, 
start_format="datetime", stop_format="datetime", 
verbose=True
)

where start/stop_format can be one of "datetime", "seconds", or "samples"?

skjerns commented 10 months ago

Yes that sounds like a good solution!

raphaelvallat commented 10 months ago

Ready for re-review @skjerns !

Of note, I did not add support for "samples" since I think it could lead to errors when the signals have different sampling frequencies. The only supported options are "datetime" and "seconds". I have also updated the unit tests, which now compare that the signal values and headers are the same.

skjerns commented 10 months ago

Ready for re-review @skjerns !

Of note, I did not add support for "samples" since I think it could lead to errors when the signals have different sampling frequencies. The only supported options are "datetime" and "seconds". I have also updated the unit tests, which now compare that the signal values and headers are the same.

ah, yes that makes total sense. Thanks for thinking of this! I'll review it soon.

raphaelvallat commented 10 months ago

All suggestions merged. Thanks!

skjerns commented 10 months ago

Thanks a lot for the contribution @raphaelvallat !