neurogeriatricskiel / KielMAT

Python based toolbox for processing motion data
https://neurogeriatricskiel.github.io/KielMAT/
Other
6 stars 1 forks source link

Add importers for dataset #10

Closed JuliusWelzel closed 1 year ago

JuliusWelzel commented 1 year ago

Hello @rmndrs89,

could you pull the new dataclass structure and update your dataset function (maybe first for keepcontrol) here.

It should work that data is then importet into these new dataclasses. Next I would like to see if the events @masoudabedinifar is detecting can be incoperated in the EventData using his translated algorithms.

rmndrs89 commented 1 year ago

Hi @JuliusWelzel,

could you put the dataclass in ngmt/utils/data_utils.py or something similar?

Best Robbin

JuliusWelzel commented 1 year ago

I updated the package structure in bf812bacab52780ac8f23c17200fbb9953429a50 with an example for HasoMed IMU data in dd190901b188f38cbff6925eceecfecae42442de, which can be found here.

rmndrs89 commented 1 year ago

Okay, I have pulled the main branch, and I am working on the Keep Control data.

But before I continue, I have a couple of questions:

  1. In the FileInfo dataclass, what is the FilePath if for example the metadata and the sensor data are split in separate files? Does the FilePath always point to the sensor data file?
  2. In the ChannelData dataclass, I have asked already, but I think it is good when we document the different between tracked_point and placement. How do I interpret placement?
  3. In the ChannelData dataclass, can we extend the VALID_CHANNEL_TYPES with, for example, ACC and ANGVEL, or are the channel types that @JuliusWelzel has chosen based on some kind of requirement of BIDS?
  4. In the ChannelData dataclass, should we add a data attribute regarding the sensor ranges. For example, if the accelerometer could measure +/- 8g these low and high bounds, could be used to normalize the acceleration data.
  5. In the RecordingData dataclass, the data attribute is supposed to be an array of shape (n_channels, n_samples), but should it not be (n_samples, n_channels) (i.e., each column contains data from a single sensor channel, each row contains data from a single time step)? The dataclass also has a data attribute sampling_frequency, but is this not already documented in the ChannelData (where it is specified for each individual channel)?

Maybe we can discuss them next meeting?

JuliusWelzel commented 1 year ago
  • In the FileInfo dataclass, what is the FilePath if for example the metadata and the sensor data are split in separate files? Does the FilePath always point to the sensor data file?

In my understanding, this would refer to the RawData file. If the Metadata is in a separate place, than the user has to specifiy this individually anyway.

  • In the ChannelData dataclass, I have asked already, but I think it is good when we document the different between tracked_point and placement. How do I interpret placement?

tracked_point is the abbreviation or naming of a location, placement refers to a more detailed description.

  • In the ChannelData dataclass, can we extend the VALID_CHANNEL_TYPES with, for example, ACC and ANGVEL, or are the channel types that @JuliusWelzel has chosen based on some kind of requirement of BIDS?

We can extend this, for the moment they are 1:1 mapping from our BIDS BEP.

  • In the ChannelData dataclass, should we add a data attribute regarding the sensor ranges. For example, if the accelerometer could measure +/- 8g these low and high bounds, could be used to normalize the acceleration data.

True, added in 0f64432487b5344a8874440e3041b2f4e194c729. Please check if feasible.

  • In the RecordingData dataclass, the data attribute is supposed to be an array of shape (n_channels, n_samples), but should it not be (n_samples, n_channels) (i.e., each column contains data from a single sensor channel, each row contains data from a single time step)? The dataclass also has a data attribute sampling_frequency, but is this not already documented in the ChannelData (where it is specified for each individual channel)?

I agree to move to a samples x channels layout. Let's finalise this in the next meeting!

rmndrs89 commented 1 year ago

Hi @JuliusWelzel ,

I have continued with update the _load_file() function for the Mobilise-D dataset (see: here) so that @masoudabedinifar can work with that. There is however an error thrown

    def __post_init__(self):
        if len(self.times) != self.time_series.shape[1]:
            raise ValueError(
                "The length of `times` should match the number of columns in `time_series`"
            )

        if len(self.channel_names) != self.time_series.shape[0]:
            raise ValueError(
                "The number of `channel_names` should match the number of rows in `time_series`"
            )

What is the self.time_series supposed to be? I cannot find where it is defined.

Besides, that I think it works, although there are still some ambiguities that can be sorted out iteratively :) In due course, I will open some new issues for that.

Thanks!

JuliusWelzel commented 1 year ago

@rmndrs89 this should be data instead of times_series as described here.

It also was in the wrong class. Should be fixed in https://github.com/neurogeriatricskiel/NGMT/commit/c2455e1e3efdfc1a70e5ccd1e8891a830fb22234.

rmndrs89 commented 1 year ago

Completed importer for keepcontrol dataset to load data into MotionData object, see 28c6d94c6dc4203ba8a15795dff465994a26d31b