physiopy / phys2bids

Python3 library to format physiological files in BIDS. At the moment, it supports Acqknowledge and Labchart. BrainHack participants, check the issues with the BrainHack labels!
https://phys2bids.readthedocs.io
Apache License 2.0
65 stars 45 forks source link

Automatic recognition of channel contents #204

Open smoia opened 4 years ago

smoia commented 4 years ago

Detailed Description

The content of a BIDS physiological file contains is described in its companion .json file - where an important entry contains the column header. At the moment, phys2bids fills that entry either using the name of the channels from the original file or using the list that the user provides with the option -chnames. Look at the channels from our tutorial file, tutorial_file It's quite easy for a human (that has a little acquaintance with physiological recordings) to make an educated guess on what is the content of each channel. We could write a function that does that "educated guess" and suggests a column header for each channel to the user.

Context / Motivation

BIDS suggests specific column headers for the files, and it would be great if we could help the standardisation of this report by adding such function.

Possible Implementation

The first thing that comes up in my mind would be a sort of pattern recognition function/comparison with a database of physiological recordings. Triggers are normally spiky (or blocky), pulses have quite a recognisable pattern, respiration another one. Chest expansion-based respiration recordings are normally smoother than others, while O2 and CO2 could have tidal shape responses and they are normally the inverse of each other.

@RayStick @BrightMG @CesarCaballeroGaudes and @rmarkello might have even better ideas!

eurunuela commented 4 years ago

I think this should be pretty easy to do with a supervised machine learning algorithm (a SVM maybe?). I don't think much training data would be necessary to make it work.

I do have a concern though. How do you differentiate between CO2 and O2? Is this differentiation critical or is it okay if the algorithm incorrectly names these two?

vinferrer commented 4 years ago

That's one issue. The other is where do we find a big dataset to train that SVM?

eurunuela commented 4 years ago

We could try with the data we have.

vinferrer commented 4 years ago

I don't think we have enough samples, but it could be a starting point

smoia commented 4 years ago

Training an SVM could be an option. We have a lot of data in house that could serve as training dataset for some types of physiological data - but we'll have to wait to make it public first, probably.

At the beginning, we could make a simple suggestion dividing triggers from pulses from general respiration based channels. In fact, the latest BIDS stable release suggests only three column headers. However, it is very important that there is no mistake in classification (CO2 and O2 are not equivalent, so we shouldn't treat them as such!), especially if we keep expanding the physiopy suite (#186), and especially since half of the contributors of phys2bids work quite a lot with CO2 recordings!

Why don't we wait to see if during the BrainWeb there is someone more experienced in machine learning or pattern recognition that could help us with their expertise?

eurunuela commented 4 years ago

That's what I was thinking, to use the in house data to train the SVM.

Regarding the CO2 and O2, that's why I was pointing out that we should find a way of correctly differentiating them. To me the signals look pretty much complimentary, meaning that an SVM algorithm may fail to correctly assign headers for this data.

Checking with people at the BrainWeb hackathon sounds good to me 😉

RayStick commented 4 years ago

A few quick thoughts:

  1. I have longer recordings (compared to the ones on OSF already) that could be provided at a later date, to train the SVM, if needed.

  2. Would this training only be implemented if there is not channel name information in the header files? Most of the software people use to record these physiological data does allow you to name channels, so in this case the training would not be needed? It would be good to have this option for the cases where there is no header info/channel names, of course.

  3. In principle, I think some pattern recognition approach would work - as Stefano explains, the different signals have noticeably different properties. As for the CO2 and O2 - yes, they are an inverse of one another (in terms of shape) however there can sometimes be a very slight recording offset due to how the gas analyzers work. Also, even though their pattern is very similar their units will not be (whether they be measured in voltage, percent or mmHg) so if that could be taken into account, alongside pattern recognition, that could be a way of distinguishing them. For example, min(CO2 channel) is always going to be smaller than min(O2 channel).

smoia commented 4 years ago

I think that the training (if any) will take place offline - the projection on new data could take place on request. It's true that most software lets you name the channels, but sometimes such channels are set for the software in a multi-user lab and they don't have the right name.

drombas commented 3 years ago

Hi, hoping this issue is still of interest!

I have done a quick frequency analysis. What I did to each signal:

  1. Subtract the mean (remove DC component)
  2. Compute the Fourier transform
  3. Take the square of the module (get power)
  4. Divide it by its sum (to get power density function)
  5. Calculate the frequency for the 95 power percentile (plotted in red)

Some ideas about the results:

A few questions that come to my mind:

frequencyPlot

eurunuela commented 3 years ago

Thank you @drombas ! Those are great ideas we can build up on.

We already know where the power of the cardiac and respiratory spectra should fall. Those would be the easiest to find by just looking at the spectra. Also, between O2 and CO2, the former has a higher amplitude.

So, I would calculate the PSD of all the channels, then find the one with cardiac frequency (there goes one channel) and another channel (or two) with the respiratory frequency. Between the respiratory ones, we set the one with the highest amplitude to O2, and the other one to CO2.

Finally, If the trigger only takes 2 different values; i.e. it's binary, the mean should be much closer to the baseline than to the maximum value. Also, if it's binary and we separate the baseline from the maximum values, we could check that the sum of all the values in the baseline is actually the minimum value times the number of points below the average (and the same with the maximum).

These may not be as fancy as doing an SVM but they could work and should be fairly simple to implement.

drombas commented 3 years ago

I can't find the option to self-assign this issue but just to let you know I'm actively working on it.

Thanks @eurunuela for the suggestions! As you said I will probably go for a time-domain detection of the trigger based on its binary nature and a spectral-domain classification between cardiac and respiratory signals.