tilman151 / rul-datasets

A collection of datasets for RUL estimation as Lightning Data Modules.
https://krokotsch.eu/rul-datasets
40 stars 2 forks source link

Test Datasets of C-MAPSS for each FD missing Samples #65

Closed eTuDpy closed 5 months ago

eTuDpy commented 5 months ago

Hi, might it be that the output of the test dataset is missing quite a lot of samples for each FD?

As far as I understand, each dataset consists out of multiple trajectories and each trajectory has a certain number of cycles. When using a sliding window approach, you slide through every trajectory individually, collecting multiple cycles (per trajectory) within one window. For instance: FD1 has 100 test trajectories. So, applying a sliding window (of let's say 40) should result in almost 2000 time series snippets / windows. Now, your test.dataloader only outputs 100 time series snippets. In fact, the number of snippets is always equal to the number of trajectories within each FD. Therefore, skewing the final test RMSE.

tilman151 commented 5 months ago

As far as I know, the test procedure of CMAPSS is defined as predicting the RUL value for the last time step of each test trajectory. This is why the test split is not windowed. Instead only the last window of each trajectory is included. Therefore, you only get 100 samples in the test set, one for each trajectory.

This behavior is unfortunately not documented. I'll try to add this as a note as soon as possible.

eTuDpy commented 5 months ago

Thank you for the detailed response. I did not know about this specific usage. It seems a little unexpected, as why deliberately exclude existing benchmark data. But this surely explains it.