time-series-machine-learning / tsml-repo

Discussion, problems and donations of data hosted at
http://www.timeseriesclassification.com
GNU General Public License v3.0
44 stars 6 forks source link

[DONATION] 3W dataset #9

Open ricardovvargas opened 5 years ago

ricardovvargas commented 5 years ago

Our paper has just been accepted by the Journal of Petroleum Science and Engineering and describes the 3W dataset, to the best of its authors' knowledge the first realistic and public dataset with rare undesirable real events in oil wells. The accepted manuscript is available on https://doi.org/10.1016/j.petrol.2019.106223 and the 3W dataset is publicly available on https://github.com/ricardovvargas/3w_dataset and on http://dx.doi.org/10.17632/r7774rwc7v.1.

As we believe that the 3W dataset's publication in the UEA & UCR Time Series Classification Repository benefits the machine learning community, we would like to confirm if this is possible. As we have considered 8 types of undesirable events that have different dynamics, the MTS do not have the same length.

If you have any question, please let me know.

Ricardo Vargas

TonyBagnall commented 4 years ago

hi there, sorry for the delay, we had a look but seems there is a lot of formatting required. Its on the list, but if you could help but collating the data into a single file for each dimension it would really help

ricardovvargas commented 4 years ago

Hi Tony. I believe this is not a good strategy. It is important to note that in addition to the variables there are other dimensions: source, well id (in case of an actual well) and label at the instance level. All of this information is preserved as described in the article and this allows each 3W user to test various hypotheses, such as (i) how good is it using simulated instances during training? (ii) is it feasible to have only one model for all anomalies in all wells? (iii) does it make sense to use samples from simulated or hand-drawn instances in validation sets? Anyway, the 3W dataset contains several dimensions and iI think it's more appropriate for you guys to decide which ones should be preserved in the UEA & UCR repository. If any explanation is unclear, do not hesitate to let me know. I can send you guys the paper and also any further explanation.