HEA/DAT signals file processing

Official GitHub repository for HEADAT library: .hea/.dat files processing tool & Wrapper of wfdb library.

Introduction

PhysioNet is an association which has as main mission to conduct and catalyze for biomedical research & education, in part by offering free access to large collections of physiological and clinical data and related open-source software.

PhysioBank is an extensive archive of PhysioNet of well-characterized digital recordings of physiologic signals, time series, and related data for use by the biomedical research community.

PhysioBank has adopted a unified and well-structured file format in order to store and organize records and signals. The entire format system description is freely available in a highly-documented website (see Resources)

HEADAT is a light-weight, fully-operational Python library used for extracting, processing, converting and exporting ECGs signals and records data (under .hea and .dat format) within specified in-memory formats (DataFrames, Series, ...) or on-disk exports/database files (see Supported Export format for more details).

HEADAT has one goal : make ECG signals processing easier and funnier by :

embedding the core functions and datasets of WFDB/ECG signals within the Python data-science ecosystem : NumPy, SciPy, Bokeh, Scikit-Learn, TensorFlow, ...
exporting data in high-quality formats for further development in various languages : Python, R, Julia, ...
supporting main features for High-Performance Computing (HPC) applications
using streaming solutions for real-time analytics and making medical insights.

As a new module, the community goal is to reach each one of these items in the coming months.

Install

HEADAT core functions are using other modules in order to run.

Please make sure a recent version of Python is running
```
python -V
python3 -V
```
Upgrade pip package-management system and underlying modules
```
pip install --upgrade pip
```
On Windows, it's recommended to perform the following command : python -m pip install --upgrade pip
Instal HEADAT
- by cloning the current repository: git clone https://github.com/lcsrodriguez/headat-signals
- by installing it using pip: pip install headat (not operational yet)

If you choose the cloning method, please perform a preliminary step: installing the dependencies manually by executing pip install -r requirements.txt

Getting started

Create a HEADAT view corresponding to a simple record :

a = HDView()

Add a record reference :

directly to the constructor
```
v = HDView("samples/aami3a")
```

by calling the .add_record() method

v = HDView()
v.add_record("samples/aami3a")

Remark : The library supports both remote and local resources; you can specify a URL or a relative/absolute path to the file. In addition, you can specify or not the .hea extension of the file, depending on your technical choice; HEADAT will automatically parse the file, gather the signals and perform the needed processing.

For instance, you can set up a HDView using either :

samples/aami3a
samples/aami3a.hea
https://physionet.org/files/aami-ec13/1.0.0/aami3a.hea
https://physionet.org/files/aami-ec13/1.0.0/aami3a

Then, you can extract and convert the signals' data to manu supported formats (see list)

v.t_csv()
v.t_xlsx()
v.t_json()
v.t_xml()
v.t_md()
v.t_tex()
v.t_parquet()
v.t_pickle()
v.t_wav()
v.t_edf()
v.t_csv()
v.t_feather()

The output will be stored into a timestamped file within the folder out/view_<simulation_timestamp>.

Get the supported MIME types with extensions for export formats :

get_export_types()
get_export_extensions()

Additionally, for monitoring purposes, you can check the number of HDView instantiated by calling :

HDView.VIEWS_INITIALIZED_COUNTER

You may have the need to collect and print the underlying signals files from one HDView :

v.get_record_files()

You can also gather the information labels and the raw signals :

v.get_signals()
v.get_infos()

If you find any other relevant feature to be implemented, please open a new issue !

List of in-memory conversion types

Class	Type	Description	Ok
Raw array	`list`	"Pure" Python array (list of lists)	✅
Raw dict	`dict`	"Pure" Python dict (dict of lists)	✅
Numpy	`ndarray`	Numpy n-dimensions array (for fast computations) (underlying C-layers)	✅
Numpy	`record`	Numpy record array	✅
Pandas	`DataFrame`	Pandas DataFrame conversion (best solution for further data processing)	✅
HDFS	-	Hadoop Distributed File System (HDFS) (using PyArrow)	❌
RDD	-	Resilient Distributed Datasets (RDD) (using PySpark)	✅

List of export formats

Name	Extension	Description	Ok
Text file	custom	Standard text file (`.out`, `.dat`, `.txt` or other custom extension)	✅
Excel	`.xslx`	MS Excel/OpenOffice Calc file	✅[^1]
CSV	`.csv`	Better for data-science	✅
JSON	`.json`	JSON file	✅
XML	`.xml`	Useful for XML parsing	✅
Markdown	`.md`	Useful for quick report in Markdown	✅
HTML	`.html`	Useful for web development	✅
LaTeX	`.tex`	Recommended for highly-detailed article in LaTeX	✅
Parquet	`.pqt`	Apache Parquet format : highly recommended for HPC	✅
Pickle	`.pkl`	For data serialization and unserialization (could be useful for such applications)	✅
HDF5	`.h5`	Hierarchical Data Format (HDF)	✅
SQLite	`.db, .sqlite`	Classic and light-weight file-based SQL RDBMS (can be relevant for requesting organized records	❌
MATLAB	`.mat`	For heavy computations on MATLAB programs (proprietary software)	✅
WAV	`.wav`	WAV files	✅
EDF	`.edf`	European Data Format (EDF) files	✅
Feather	`.fea, .feather`	Apache Arrow's Feather file format for fast binary columnar in-memory storage	✅
STATA	`.dta`	STATA Statistical Analysis software (proprietary software)	✅

For fast processing steps, please consider the Pickle, Parquet and Feather formats, especially designed for HPC.

Remarks

If you want to suggest a new supported format, please create a new issue with the NEW EXPORT FORMAT label.
If you want to add your own version of a new exporter for WFDB data, please init a new Pull Request

[^1]: MS Excel (.xls/.xlsx) files have a maximum limit of lines to be written on a single spreadsheet (1048576). If the studied signals are too long, unexpected behavior can occur ! Please better consider .csv export with additional processing steps instead of incomplete .xlsx formatting.

Resources

See RESOURCES.

Acknowledgement

Developer & Maintainer: Lucas RODRIGUEZ
Development date: June 2022 - today

PhysioNet is a repository of freely-available medical research data, managed by the MIT Laboratory for Computational Physiology (MIT-LCP).

This wrapper is based on wfdb features. Please consider using it for further development.

License

The LICENSE file contains the full license details.

lcsrodriguez / headat-signals

readme