holgern / pyedflib

pyedflib is a python library to read/write EDF+/BDF+ files based on EDFlib.
http://pyedflib.readthedocs.org/
BSD 3-Clause "New" or "Revised" License
214 stars 121 forks source link

Add high-level function for truncating to specified start/stop times #195

Open raphaelvallat opened 1 year ago

raphaelvallat commented 1 year ago

Hi,

Oftentimes it is useful to crop/truncate the EDF file to specified start and/or stop times (e.g. time in bed). I suggest adding a new function pyedflib.highlevel.truncate_edf (or "crop_edf")

This is an initial working implementation tested on pyedflib 0.1.30

"""Function to crop an EDF file to specified start/end times."""
from pathlib import Path
import pyedflib as pedf
import datetime as dt

def truncate_edf(path_edf, new_start=None, new_stop=None):
    """Truncate an EDF file to desired start/stop times.

    Parameters
    ----------
    path_edf : str
        The path the the EDF file
    new_start : datetime.datetime
        The new start timestamp
    new_stop : datetime.datetime
        The new stop timestamp
    """
    assert isinstance(new_start, (dt.datetime, type(None)))
    assert isinstance(new_stop, (dt.datetime, type(None)))
    path_edf = Path(path_edf)
    assert path_edf.exists(), "File does not exist."

    # Open the original EDF file
    edf = pedf.EdfReader(str(path_edf))
    signals_headers = edf.getSignalHeaders()
    header = edf.getHeader()

    # Define start/stop in samples
    current_start = edf.getStartdatetime()
    if new_start is None:
        new_start = current_start
    start_diff_seconds = (new_start - current_start).total_seconds()
    assert current_start <= new_start

    current_stop = current_start + dt.timedelta(seconds=edf.getFileDuration())
    current_duration = current_stop - current_start
    if new_stop is None:
        new_stop = current_stop
    assert new_stop <= current_stop
    stop_diff_from_start = (new_stop - current_start).total_seconds()

    # Crop each signal
    signals = []
    for i in range(len(edf.getSignalHeaders())):
        sf = edf.getSampleFrequency(i)
        start_idx = int(start_diff_seconds * sf)
        stop_idx = int(stop_diff_from_start * sf)
        signals.append(edf.readSignal(i, start=start_idx, n=stop_idx - start_idx))
    edf.close()

    # Update header startdate and save file
    header["startdate"] = new_start
    outpath = str(path_edf).replace(".edf", "cropped.edf")
    pedf.highlevel.write_edf(outpath, signals, signals_headers, header)

    # Get new EDF start, stop and duration
    edf = pedf.EdfReader(outpath)
    start = edf.getStartdatetime()
    stop = start + dt.timedelta(seconds=edf.getFileDuration())
    duration = stop - start
    edf.close()

    # Verbose
    print(f"Original: {current_start} to {current_stop} ({current_duration})")
    print(f"Truncated: {start} to {stop} ({duration})")
    print(f"Succesfully written file: {outpath}")
skjerns commented 1 year ago

That sounds like a great addition!

Do you want to make a PR?

raphaelvallat commented 1 year ago

Sure thing, I'll submit a PR in the new couple of weeks. Is there anything that you'd modify from the initial code that I sent? Are there general contributing guidelines for this library?

Thanks