robotology / human-dynamics-estimation

Software repository for estimating human dynamics
BSD 3-Clause "New" or "Revised" License
83 stars 28 forks source link

Investigate behaviour of `.mat` logger saving #406

Open claudia-lat opened 1 year ago

claudia-lat commented 1 year ago

Recently I found a strange behaviour in the .mat logger. If the acquisition is longer than a fixed interval (probably hardcoded in the code with an autosaving function), the logger split into several .mat files the acquisitions. Actually the problem doesn't lie on this aspect. The problem lies on the fact that all the fields inside the .mat struct are consistent only in the case in which the acquisition consists of one file, i.e, all the elements are the same number of samples and the same timestamp. When the acquisition is split into two or more .mat, elements inside each struct are no longer consistent, i.e., they have different number of samples and timestamps do no coincide.
A patch solution for me was to concatenate all the .mat files and I've verified that, after concatenation, both number of samples and timestamps coincide.

lrapetti commented 1 year ago

If the acquisition is longer than a fixed interval (probably hardcoded in the code with an autosaving function), the logger split into several .mat files the acquisitions.

Yes, this behaviour is defined by setting the save_periodically flag to true, which then sets the configuration of robometry (see the wearable code parsing the configuration here)

The problem lies on the fact that all the fields inside the .mat struct are consistent only in the case in which the acquisition consists of one file, i.e, all the elements are the same number of samples and the same timestamp. When the acquisition is split into two or more .mat, elements inside each struct are no longer consistent, i.e., they have different number of samples and timestamps do no coincide.

Concerning this behavior, before investigating if its something specific of the logger device implemented in wearables, I would try to understand if it is an expected behaviour from robometry. Maybe @Nicogene might help on this?

S-Dafarra commented 1 year ago

When the acquisition is split into two or more .mat, elements inside each struct are no longer consistent, i.e., they have different number of samples and timestamps do no coincide. A patch solution for me was to concatenate all the .mat files and I've verified that, after concatenation, both number of samples and timestamps coincide.

Maybe what is happening is that the autosave happens while the data is being added to the buffer in https://github.com/robotology/wearables/blob/f80dd5078d467535d01563dd9a41e2955e6b43b2/wrappers/IWearLogger/src/IWearLogger.cpp#L262-L792

If the bufferManager decides to save the file between two push_back, then some signals get saved in one file, some other in the next. Then, when merging the file together, all the signal end up having the same amount of data. I guess this is more of an issue of robometry, since it does not enforce all the signals (aka "channels") to have an equal number of elements when auto-saving the file

lrapetti commented 1 year ago

Thanks @S-Dafarra, this is definitively something that can happen if there is nothing that force the channels to have the same amount of data (and for a general usage of robometry, you can expect to save data with different frequency so there would be no reason for forcing this check).

Following this assumption, however, I would expect the difference to be a single data sample. However, in @claudia-lat dataset the difference seems to be of multiple samples. Am I correct @claudia-lat?

claudia-lat commented 1 year ago

the difference seems to be of multiple samples. Am I correct @claudia-lat?

Yes @lrapetti. I would go for a practical example. In an acquisition in which save_periodically flag split the whole .mat into 8 .mat files, for one element in the file, e.g., iFeelSuit_mag_Node_3, number of samples are: 10913, 10970, 10969,, 10949, 10969, 10958, 10971, 4793. Another element in the list, e.g., iFeelSuit_fbAcc_Node_9 will have different numbers of samples BUT the total sum will always be the equal.

S-Dafarra commented 1 year ago

iFeelSuit_fbAcc_Node_9 will have different numbers of samples

How different?

Nicogene commented 1 year ago

Having multiple .mat files is not an ideal solution, in the past, we started investigating the possibility of implementing a collector/daemon, but we didn't do much in that sense:

It is a while that I don't touch the robometry code but if I remember correctly there are no mutex protections in the save function and push_back and since the saveToFile function for large amount of data is quite expensive, it is possible of different number of sample per channels

claudia-lat commented 1 year ago

iFeelSuit_fbAcc_Node_9 will have different numbers of samples

How different?

@S-Dafarra very few samples (around maximum 10)

S-Dafarra commented 1 year ago

Can you specify the exact number of data points for a couple of channels in two consecutive files?

claudia-lat commented 1 year ago

Ok @S-Dafarra . For the sake of simplicity I'm now considering an acquisition example with 2 .mat consecutive files:

element first file second file
iFeelSuit_mag_Node_3 10845 71
iFeelSuit_fbAcc_Node_9 10846 70
iFeelSuit_gyro_Node_5 10855 61
FreeEmg_emg_l_biceps_fem 10860 56
iFeelSuit_orient_Node_3 10865 51
... .... ...

and please note that even between the same node values are not equal.

S-Dafarra commented 1 year ago

I checked the code of robometry. There is a mutex being locked for every channel both when pushing and when saving the file. When saving the file, each channel data is converted into a matioCpp::Struct before saving and the corresponding mutex gets locked. This process may take some time since it requires memory allocation. Hence, while converting a channel, it is still possible to push to the other channels. At the same time, since all channels are pushed in the same thread, if one mutex gets locked when saving to file, also all the other push are locked.

Let's suppose we have 5 channels and we push in the order 1, 2, 3, 4, 5. At the same time, we save in a different order, let's say 4, 3, 1, 5, 2. Let's also assume that as soon as one mutex gets unlocked, the other channels can be pushed before another mutex gets locked. Let's suppose we start saving when each channel has 10 elements. We would have the following

Cycle 1

1: push -> 11 2: push -> 11 3: push -> 11 4: save (10) 5: push -> 11

Cycle 2

1: push -> 12 2: push -> 12 3: save (11) 4: save -> 1 5: push -> 12

Cycle 3

1: save (12) 2: push -> 13 3: push -> 1 4: push -> 2 5: push -> 13

Cycle 4

1: push -> 1 2: push -> 14 3: push -> 2 4: push- > 3 5: save (13)

Cycle 5

1: push -> 2 2: save (14) 3: push -> 3 4: push -> 4 5: push -> 1

This might explain the difference between channels. I would expect a difference between channels at most equal to the number of channels. If less, it means that two channels get saved in the same push loop. If more, it could indicate that the thread that saves to file goes in starvation, i.e. the mutexes get repeatedly locked by the push loop and the save thread cannot lock it.

What is the maximum difference of data points compared to the number of channels?

Thinking about it, this could lead to data loss if we reach the limit of the circular buffer we use to save channels (old data get overwritten), specified in https://github.com/robotology/wearables/blob/f80dd5078d467535d01563dd9a41e2955e6b43b2/wrappers/IWearLogger/src/IWearLogger.cpp#L918-L925