Proposed package structure

rmndrs89 commented 1 year ago

Package structure

Do we need to agree on some package structure:

ngmt/
├── datasets/
│   ├── __init__.py
│   ├── braviva.py
│   ├── keepcontrol.py
│   └── mobilised.py
├── utils/
│   ├── __init__.py
│   ├── data_utils.py  # dataclass are introduced
│   └── preprocessing.py  # resampling, filtering functions, ...
├── modules/
│   ├── icd/  # initial contact detection
│   │   └── initial_contact_detection.py
│   ├── gsd/  # gait sequence detection
│   │   └── gait_sequence_detection.py
│   └── sle/  # step length estimation
│   │   └── step_length_estimation.py
..
└── index.html

Please comment with any thoughts.

JuliusWelzel commented 1 year ago

Hello, instead of data_utls.py I would like to propose classes per type of recording:

> ├── utils/
> │   ├── __init__.py
> │   ├── data_io.py  # recoding specific dataclasses are introduced, which are modality agnostic
> │   ├── imu.py  # IMU specific dataclasses are introduced
> │   ├── optical.py  # optical motion capture specific dataclasses are introduced
> │   └── preprocessing.py  # resampling, filtering functions, ...

This way we keep generic and device information seperated, which will be useful for large queries on big datasets. Atm the information in the dataclass IMUDataset seems to be specified for IMU data only. This way, device specifications can be put in e.g. imu.py, and dataset or id-specific infos can live in data_io.py. What are your thoughts? I will try to create a hierarchical diagram of this.

rmndrs89 commented 1 year ago

Hi @JuliusWelzel,

That makes sense, thanks for your thougts! It may already work if we simply rename the dataclasses, no? The recording simply has units, fs, type, data (which is kind of in the direction that your "metadata" structure is going). Then multiple recordings can be combined into a device, and multiple devices make up a "dataset".

For optical motion capture data, you would have a recording for each marker, and you may combine multiple markers into a "cluster of markers" or combine all markers in a "dataset".

What do you think? Looking forward to your diagram!

JuliusWelzel commented 1 year ago

Hello, so this is my proposal:

classDiagram
    class MotionData {
        info: FileInfo
        channels: ChannelMetaData
        times: np.ndarray
        time_series: np.ndarray
        check_channel_info()
        get_inital_contacts()
    }

    class FileInfo {
        SubjectId: str
        TaskName: str
        SamplingFrequency: float
        FilePath: str
        import_data()
    }

    class ChannelMetaData {
        name: list[int]
        component: list[str]
        ch_type: list[str]
        tracked_point: list[int]
        units: list[int]
        get_channel_units(): str
    }

    class DatasetInfo {
        SubjectIds: list[str]
        TaskNames: list[str]
        group_data()
    }

    MotionData <-- FileInfo: indent on disk
    MotionData <-- ChannelMetaData: info per channel in python
    DatasetInfo <-- MotionData: info per dataset
    FileInfo --> ChannelMetaData: info per channel on disk

I could go ahead and implement this in the data classes. I think it is nice to have a distinction between device-specific metadata and channel-specific metadata. For OMC some predefined clusters of markers go into MotionData and you have to specify each channel information in ChannelMetaData.

JuliusWelzel commented 1 year ago

Hello, here is an updated proposal after today's discussion:

classDiagram

    class MotionData {
        channels: ChannelData
        data: list[RecordingData]
        times: np.1darray
        info: FileInfo
        Manufacturer: Optional[list]
        check_channel_info()
    }

    class FileInfo {
        SubjectId: str
        TaskName: str
        ProjectName: str
        FilePath: Optional[str]
        import_data()
    }

    class ChannelData {
        name: list[int]
        component: list[str]
        ch_type: list[str]
        tracked_point: list[int]
        units: list[int]
        get_channel_units()
    }

    class RecordingData {
        type: str
        units: ChannelData
        sampling_rate: float
        times: np.ndarray
        data: np.ndarray
        events: Optional[list]
        get_duration(): datetime
        get_inital_contacts()
    }

    RecordingData --> MotionData: raw data with same sampling rate
    ChannelData --> MotionData: info per channel and recording
    FileInfo --> MotionData: indent on disk
    FileInfo --> ChannelData: info per channel 
    FileInfo --> RecordingData: raw time series data

I would discard device data as information about a device like manufacturer is not required to interpret any data. However recording infromation like fs or channel type is. What do you say?

JuliusWelzel commented 1 year ago

@rmndrs89, @masoudabedinifar , @hansencl still waiting for feedback here :)

masoudabedinifar commented 1 year ago

Thank you @JuliusWelzel, It seems good and is as we discussed in the last meeting.

JuliusWelzel commented 1 year ago

We discussed if BIDS like events should be included as a own dataclass

JuliusWelzel commented 1 year ago

Here is an updated version of the proposed structure:

classDiagram

    class FileInfo {
        SubjectId: str
        TaskName: str
        ProjectName: str
        FilePath: Optional[str]
        import_data()
    }

    class ChannelData {
        name: list[int]
        component: list[str]
        ch_type: list[str]
        tracked_point: list[int]
        units: list[int]
        get_channel_units()
    }

    class EventData {
        onset: float
        duration: float
        sample: integer
        trial_type: Optional[string]
        value: Optional[number or string]
    }

    class RecordingData {
        type: str
        units: ChannelData
        sampling_rate: float
        times: np.1darray
        data: np.ndarray
        events: Optional[list]
        get_inital_contacts()
    }

    class MotionData {
        data: list[RecordingData]
        world_time: np.1darray 
        info: list[FileInfo]
        Manufacturer: Optional[list]
        check_channel_info()
    }

    RecordingData --> MotionData: raw data with same sampling rate
    ChannelData --> RecordingData: info per channel
    EventData --> RecordingData: info about potential events
    FileInfo --> MotionData: indent on disk
    FileInfo --> ChannelData: info per channel 
    FileInfo --> RecordingData: raw time series data

This is the planned class structure for motion data. Data from any file format can ultimately be imported into the MotionData class. The MotionData object contains a FileInfo object. The FileInfo object contains information about the file, such as the subject ID, the task name, the project name and the file path. The MotionData class also contains a list of RecordingData objects.

Each RecordingData object contains the raw data, the sampling rate, the time stamps and the channel info (ChannelData) of a tracking system. It is up to the user how to group the source data into a tracking system. The RecordingData object can also contain information about events. The EventData objects stores information about events such as onset or duration.

The ChannelData object is used to store the channel name, the channel type, the channel units and the tracked point.

The world_time vector in the MotionData class refers to a global time, which can be used to synchronise data from multiple tracking systems stored in RecordingData. Any algorithms which are run on a RecordingData such as get_inital_contacts() can add events with onsets to a RecordingsData class. Events from multiple tracking systems can then be related via the world_time.

JuliusWelzel commented 1 year ago

Should we add in the text, that the Algorithms only run on dedicated channel types defined in the ChannelData class per tracking system?

JuliusWelzel commented 1 year ago

Completed as in 5fa9afe9c054be20c01de2e32868c375a3296111

neurogeriatricskiel / KielMAT

Proposed package structure #4