Official GitHub repository for HEADAT library: .hea
/.dat
files processing tool & Wrapper of wfdb library.
PhysioNet is an association which has as main mission to conduct and catalyze for biomedical research & education, in part by offering free access to large collections of physiological and clinical data and related open-source software.
PhysioBank is an extensive archive of PhysioNet of well-characterized digital recordings of physiologic signals, time series, and related data for use by the biomedical research community.
PhysioBank has adopted a unified and well-structured file format in order to store and organize records and signals. The entire format system description is freely available in a highly-documented website (see Resources)
HEADAT is a light-weight, fully-operational Python library used for extracting, processing, converting and
exporting ECGs signals and records data (under .hea
and .dat
format) within specified in-memory formats (DataFrames, Series, ...)
or on-disk exports/database files (see Supported Export format for more details).
HEADAT has one goal : make ECG signals processing easier and funnier by :
As a new module, the community goal is to reach each one of these items in the coming months.
HEADAT core functions are using other modules in order to run.
Please make sure a recent version of Python is running
python -V
python3 -V
Upgrade pip
package-management system and underlying modules
pip install --upgrade pip
On Windows, it's recommended to perform the following command : python -m pip install --upgrade pip
Instal HEADAT
git clone https://github.com/lcsrodriguez/headat-signals
pip
: pip install headat
(not operational yet)If you choose the cloning method, please perform a preliminary step: installing the dependencies manually by executing pip install -r requirements.txt
Create a HEADAT view corresponding to a simple record :
a = HDView()
Add a record reference :
v = HDView("samples/aami3a")
.add_record()
method
v = HDView()
v.add_record("samples/aami3a")
Remark : The library supports both remote and local resources; you can specify a URL or a relative/absolute path to
the file. In addition, you can specify or not the .hea
extension of the file, depending on your technical choice; HEADAT
will automatically parse the file, gather the signals and perform the needed processing.
For instance, you can set up a HDView using either :
samples/aami3a
samples/aami3a.hea
https://physionet.org/files/aami-ec13/1.0.0/aami3a.hea
https://physionet.org/files/aami-ec13/1.0.0/aami3a
Then, you can extract and convert the signals' data to manu supported formats (see list)
v.t_csv()
v.t_xlsx()
v.t_json()
v.t_xml()
v.t_md()
v.t_tex()
v.t_parquet()
v.t_pickle()
v.t_wav()
v.t_edf()
v.t_csv()
v.t_feather()
The output will be stored into a timestamped file within the folder out/view_<simulation_timestamp>
.
Get the supported MIME types with extensions for export formats :
get_export_types()
get_export_extensions()
Additionally, for monitoring purposes, you can check the number of HDView instantiated by calling :
HDView.VIEWS_INITIALIZED_COUNTER
You may have the need to collect and print the underlying signals files from one HDView :
v.get_record_files()
You can also gather the information labels and the raw signals :
v.get_signals()
v.get_infos()
If you find any other relevant feature to be implemented, please open a new issue !
Class | Type | Description | Ok |
---|---|---|---|
Raw array | list |
"Pure" Python array (list of lists) | ✅ |
Raw dict | dict |
"Pure" Python dict (dict of lists) | ✅ |
Numpy | ndarray |
Numpy n-dimensions array (for fast computations) (underlying C-layers) | ✅ |
Numpy | record |
Numpy record array | ✅ |
Pandas | DataFrame |
Pandas DataFrame conversion (best solution for further data processing) | ✅ |
HDFS | - | Hadoop Distributed File System (HDFS) (using PyArrow) | ❌ |
RDD | - | Resilient Distributed Datasets (RDD) (using PySpark) | ✅ |
Name | Extension | Description | Ok |
---|---|---|---|
Text file | custom | Standard text file (.out , .dat , .txt or other custom extension) |
✅ |
Excel | .xslx |
MS Excel/OpenOffice Calc file | ✅[^1] |
CSV | .csv |
Better for data-science | ✅ |
JSON | .json |
JSON file | ✅ |
XML | .xml |
Useful for XML parsing | ✅ |
Markdown | .md |
Useful for quick report in Markdown | ✅ |
HTML | .html |
Useful for web development | ✅ |
LaTeX | .tex |
Recommended for highly-detailed article in LaTeX | ✅ |
Parquet | .pqt |
Apache Parquet format : highly recommended for HPC | ✅ |
Pickle | .pkl |
For data serialization and unserialization (could be useful for such applications) | ✅ |
HDF5 | .h5 |
Hierarchical Data Format (HDF) | ✅ |
SQLite | .db, .sqlite |
Classic and light-weight file-based SQL RDBMS (can be relevant for requesting organized records | ❌ |
MATLAB | .mat |
For heavy computations on MATLAB programs (proprietary software) | ✅ |
WAV | .wav |
WAV files | ✅ |
EDF | .edf |
European Data Format (EDF) files | ✅ |
Feather | .fea, .feather |
Apache Arrow's Feather file format for fast binary columnar in-memory storage | ✅ |
STATA | .dta |
STATA Statistical Analysis software (proprietary software) | ✅ |
For fast processing steps, please consider the Pickle, Parquet and Feather formats, especially designed for HPC.
Remarks
[^1]: MS Excel (.xls
/.xlsx
) files have a maximum limit of lines to be written on a single spreadsheet (1048576).
If the studied signals are too long, unexpected behavior can occur ! Please better consider .csv
export with additional processing steps instead of incomplete .xlsx
formatting.
See RESOURCES.
PhysioNet is a repository of freely-available medical research data, managed by the MIT Laboratory for Computational Physiology (MIT-LCP).
This wrapper is based on wfdb features. Please consider using it for further development.
The LICENSE file contains the full license details.