Closed rmndrs89 closed 1 year ago
Hello, instead of data_utls.py
I would like to propose classes per type of recording:
> ├── utils/
> │ ├── __init__.py
> │ ├── data_io.py # recoding specific dataclasses are introduced, which are modality agnostic
> │ ├── imu.py # IMU specific dataclasses are introduced
> │ ├── optical.py # optical motion capture specific dataclasses are introduced
> │ └── preprocessing.py # resampling, filtering functions, ...
This way we keep generic and device information seperated, which will be useful for large queries on big datasets. Atm the information in the dataclass IMUDataset
seems to be specified for IMU data only. This way, device specifications can be put in e.g. imu.py
, and dataset or id-specific infos can live in data_io.py
. What are your thoughts? I will try to create a hierarchical diagram of this.
Hi @JuliusWelzel,
That makes sense, thanks for your thougts! It may already work if we simply rename the dataclasses, no? The recording simply has units, fs, type, data (which is kind of in the direction that your "metadata" structure is going). Then multiple recordings can be combined into a device, and multiple devices make up a "dataset".
For optical motion capture data, you would have a recording for each marker, and you may combine multiple markers into a "cluster of markers" or combine all markers in a "dataset".
What do you think? Looking forward to your diagram!
Hello, so this is my proposal:
classDiagram
class MotionData {
info: FileInfo
channels: ChannelMetaData
times: np.ndarray
time_series: np.ndarray
check_channel_info()
get_inital_contacts()
}
class FileInfo {
SubjectId: str
TaskName: str
SamplingFrequency: float
FilePath: str
import_data()
}
class ChannelMetaData {
name: list[int]
component: list[str]
ch_type: list[str]
tracked_point: list[int]
units: list[int]
get_channel_units(): str
}
class DatasetInfo {
SubjectIds: list[str]
TaskNames: list[str]
group_data()
}
MotionData <-- FileInfo: indent on disk
MotionData <-- ChannelMetaData: info per channel in python
DatasetInfo <-- MotionData: info per dataset
FileInfo --> ChannelMetaData: info per channel on disk
I could go ahead and implement this in the data classes. I think it is nice to have a distinction between device-specific metadata and channel-specific metadata. For OMC some predefined clusters of markers go into MotionData
and you have to specify each channel information in ChannelMetaData
.
Hello, here is an updated proposal after today's discussion:
classDiagram
class MotionData {
channels: ChannelData
data: list[RecordingData]
times: np.1darray
info: FileInfo
Manufacturer: Optional[list]
check_channel_info()
}
class FileInfo {
SubjectId: str
TaskName: str
ProjectName: str
FilePath: Optional[str]
import_data()
}
class ChannelData {
name: list[int]
component: list[str]
ch_type: list[str]
tracked_point: list[int]
units: list[int]
get_channel_units()
}
class RecordingData {
type: str
units: ChannelData
sampling_rate: float
times: np.ndarray
data: np.ndarray
events: Optional[list]
get_duration(): datetime
get_inital_contacts()
}
RecordingData --> MotionData: raw data with same sampling rate
ChannelData --> MotionData: info per channel and recording
FileInfo --> MotionData: indent on disk
FileInfo --> ChannelData: info per channel
FileInfo --> RecordingData: raw time series data
I would discard device data as information about a device like manufacturer is not required
to interpret any data. However recording
infromation like fs or channel type is. What do you say?
@rmndrs89, @masoudabedinifar , @hansencl still waiting for feedback here :)
Thank you @JuliusWelzel, It seems good and is as we discussed in the last meeting.
We discussed if BIDS like events should be included as a own dataclass
Here is an updated version of the proposed structure:
classDiagram
class FileInfo {
SubjectId: str
TaskName: str
ProjectName: str
FilePath: Optional[str]
import_data()
}
class ChannelData {
name: list[int]
component: list[str]
ch_type: list[str]
tracked_point: list[int]
units: list[int]
get_channel_units()
}
class EventData {
onset: float
duration: float
sample: integer
trial_type: Optional[string]
value: Optional[number or string]
}
class RecordingData {
type: str
units: ChannelData
sampling_rate: float
times: np.1darray
data: np.ndarray
events: Optional[list]
get_inital_contacts()
}
class MotionData {
data: list[RecordingData]
world_time: np.1darray
info: list[FileInfo]
Manufacturer: Optional[list]
check_channel_info()
}
RecordingData --> MotionData: raw data with same sampling rate
ChannelData --> RecordingData: info per channel
EventData --> RecordingData: info about potential events
FileInfo --> MotionData: indent on disk
FileInfo --> ChannelData: info per channel
FileInfo --> RecordingData: raw time series data
This is the planned class structure for motion data. Data from any file format can ultimately be imported into the MotionData
class. The MotionData
object contains a FileInfo
object. The FileInfo
object contains information about the file, such as the subject ID, the task name, the project name and the file path. The MotionData
class also contains a list of RecordingData
objects.
Each RecordingData
object contains the raw data, the sampling rate, the time stamps and the channel info (ChannelData
) of a tracking system. It is up to the user how to group the source data into a tracking system.
The RecordingData
object can also contain information about events. The EventData
objects stores information about events such as onset or duration.
The ChannelData
object is used to store the channel name, the channel type, the channel units and the tracked point.
The world_time
vector in the MotionData
class refers to a global time, which can be used to synchronise data from multiple tracking systems stored in RecordingData
. Any algorithms which are run on a RecordingData
such as get_inital_contacts()
can add events with onsets to a RecordingsData
class. Events from multiple tracking systems can then be related via the world_time
.
Should we add in the text, that the Algorithms only run on dedicated channel types defined in the ChannelData
class per tracking system?
Completed as in 5fa9afe9c054be20c01de2e32868c375a3296111
Package structure
Do we need to agree on some package structure:
Please comment with any thoughts.