hahnicity / ventMAP

Open Source Multi-Purpose Ventilator Analytics Library
GNU General Public License v3.0
14 stars 8 forks source link

Handling data from ventmode #2

Closed Evidlo closed 4 years ago

Evidlo commented 4 years ago

I'm trying to extract some pressure/flow waveforms from the raw data published in the ventmode repo, but it seems there is an extra line containing some nonprinting bytes. Is there a built-in function for stripping this data so I can read the file with extract_raw?

Evidlo commented 4 years ago

I figured it out.

For posterity:

import io
import numpy as np

from ventmap.raw_utils import extract_raw

generator = extract_raw(
    io.open('../data/ventmap.csv', encoding='ascii', errors='ignore'),
    False
)
pressure = []
for breath in generator:
    pressure.append(breath['pressure'])

pressure = np.hstack(pressure)
hahnicity commented 4 years ago

Yes, this extra line is very common in files. We have taken steps to remove the data so that the file can be processed properly. We do handle this situation in ventmode repository

In general please see documentation for extract_raw. The extract_raw function should do all of this for you automatically. But if it isn't please submit a bug and I will fix it ASAP.

Greg

hahnicity commented 4 years ago

Just for note: the code you have concatenates the pressure for all breaths into a long array of pressure observations that is not demarcated by breath.

If you just want to plot things out on a per-breath basis you could do

import matplotlib.pyplot as plt

for breath in generator:
    plt.plot(breath['pressure'])
Evidlo commented 4 years ago

For my particular application, I'm interested in obtaining the pressure waveform along with labeled data corresponding to PIP, PEEP and respiratory rate. The data in ventmode has the raw waveforms, but it doesn't seem to have the labels I'm looking for. (on an unrelated note, the csv files in y_dir have a header which has 21 items, but the data has only 19 items).

Fortunately, the data in tests/samples/ appears to have the labels I'm looking for, but it seems like the sampled data is a bit differently shaped.

0282dbl_diff.csv_test 0282/0282dbl_diff_v3_5_8__breath_meta.csv_test plot1

0149_2016-02-17-08-38-13_1.csv_test 0149_2016-02-17-08-38-13_1_v5_1_0__breath_meta.csv_test plot2

Is there a particular dataset that you can recommend that you would consider "the best"?

Below is the code used to generate plots.

import io
import numpy as np
import pandas as pd
from ventmap.raw_utils import extract_raw
import matplotlib.pyplot as plt

# %% load

# waveforms_file = '../data/0282dbl_diff.csv_test'
# labels_file = '../data/0282dbl_diff_v3_5_8__breath_meta.csv_test'

waveforms_file = '../data/0149_2016-02-17-08-38-13_1.csv_test'
labels_file = '../data/0149_2016-02-17-08-38-13_1_v5_1_0__breath_meta.csv_test'

generator = extract_raw(
    io.open(waveforms_file, errors='ignore'),
    False
)
breath_waveforms = []
for breath in generator:
    # print('parsed 1 breath')
    # breath data is output in dictionary format
    breath_waveforms.append(breath['pressure'])

breath_waveforms = np.array(breath_waveforms)
breath_labels = pd.read_csv(labels_file)

# %% plot

# expand scalar pip/peep/rr into vector equal to breath length
breath_pips = []
breath_peeps = []
breath_rrs = []
for waveform, labels in zip(breath_waveforms, breath_labels.itertuples()):
    breath_pips.append(np.ones(len(waveform)) * labels.PIP)
    breath_peeps.append(np.ones(len(waveform)) * labels.PEEP)
    breath_rrs.append(np.ones(len(waveform)) * labels.inst_RR)

# concatenate breaths and plot
breath_waveform = np.hstack(breath_waveforms)
breath_pip = np.hstack(breath_pips)
breath_peep = np.hstack(breath_peeps)
breath_rr = np.hstack(breath_rrs)

start = 3000
end = 4000
plt.plot(breath_waveform[start:end])
plt.plot(breath_pip[start:end])
plt.plot(breath_peep[start:end])
plt.legend(['waveform', 'pip', 'peep'])

plt.show()
Evidlo commented 4 years ago

Also, in the 0282 dataset, there are some extra spikes a few thousand samples in. Are these erroneous or e.g. spontaneous patient breaths?

fig

hahnicity commented 4 years ago

Also, in the 0282 dataset, there are some extra spikes a few thousand samples in. Are these erroneous or e.g. spontaneous patient breaths?

It looks like double triggering where the patient wants to breathe more than the ventilator is allowing them to.

For my particular application, I'm interested in obtaining the pressure waveform along with labeled data corresponding to PIP, PEEP and respiratory rate. The data in ventmode has the raw waveforms, but it doesn't seem to have the labels I'm looking for. (on an unrelated note, the csv files in y_dir have a header which has 21 items, but the data has only 19 items). For extracting metadata (I-Time, TVe, TVi) from files.

Please see the ventmap documentation for how to get this.

from ventmap.breath_meta import get_file_breath_meta

# Data output is normally in list format. Ordering information can be found in
# ventmap.constants.META_HEADER.
breath_meta = get_file_breath_meta(<filepath to vent data>)
# If you want a pandas DataFrame then you can set the optional argument to_data_frame=True
breath_meta = get_file_breath_meta(<filepath to vent data>, to_data_frame=True)

For extracting metadata from individual breaths

from io import open
# production breath meta refers to clinician validated algorithms
# experimental breath meta refers to non-validated algorithms
from ventmap.breath_meta import get_production_breath_meta, get_experimental_breath_meta
from ventmap.raw_utils import extract_raw, read_processed_file

generator = extract_raw(open(<filepath to vent data>), False)
# OR
generator = read_processed_file(<raw file>, <processed data file>)

for breath in generator:
    # Data output is normally in list format. Ordering information can be found in
    # ventmap.constants.META_HEADER.
    prod_breath_meta = get_production_breath_meta(breath)
    # Ordering information can be found in ventmap.constants.EXPERIMENTAL_META_HEADER.
    experimental_breath_meta = get_experimental_breath_meta(breath)
pre-oma commented 4 years ago

I am currently running the extract_raw code on a PB-840 data set but it keeps giving me the error below-

AttributeError: 'str' object has no attribute 'decode'.

How do i rectify this

hahnicity commented 4 years ago

Hi @pre-oma,

Can you attach the stacktrace, and your version of ventmap that you are using. If you don't know how to get it open a command line and type

pip freeze | grep ventmap
pre-oma commented 4 years ago

AttributeError Traceback (most recent call last)

in 6 # breaths without BS/BE markers will be dropped. If you say True, then breaths 7 # without BS/BE will be kept ----> 8 generator = extract_raw(open('1000full.csv'), False) 9 for breath in generator: 10 # breath data is output in dictionary format ~\Anaconda3\lib\site-packages\ventmap\raw_utils.py in extract_raw(descriptor, ignore_missing_bes, rel_bn_interval, vent_bn_interval, spec_rel_bns, spec_vent_bns) 200 using a specific ventilator class like PB840.extract_raw 201 """ --> 202 pb840 = PB840File(descriptor) 203 return pb840.extract_raw(ignore_missing_bes, rel_bn_interval, vent_bn_interval, spec_rel_bns, spec_vent_bns) 204 ~\Anaconda3\lib\site-packages\ventmap\raw_utils.py in __init__(self, descriptor) 43 self.rel_bn = 0 44 try: ---> 45 self.descriptor = clear_descriptor_null_bytes(self.descriptor) 46 except UnicodeDecodeError: 47 raise BadDescriptorError('You seem to have opened a file with garbled bytes. you should open it using io.open(file, encoding="ascii", errors="ignore"') ~\Anaconda3\lib\site-packages\ventmap\clear_null_bytes.py in clear_descriptor_null_bytes(descriptor) 12 def clear_descriptor_null_bytes(descriptor): 13 try: ---> 14 descriptor_text = descriptor.read().replace('\x00', '').decode('utf-8', 'ignore') 15 except NameError: # python 3 16 descriptor_text = str(descriptor.read()).replace('\x00', '') AttributeError: 'str' object has no attribute 'decode' The version of ventmap is 1.4.2
hahnicity commented 4 years ago

Thank you for bringing this to my attention. apparently I broke python3 with version 1.4.2. That is fixed now. You can upgrade using following command

pip install -U ventmap