nilmtk / nilmtk-contrib

Apache License 2.0
114 stars 59 forks source link

whether or not the dataset is handled by the dataset_converter to the hdf5 files and is also supplied for the research community? #45

Closed oneway3124 closed 3 years ago

oneway3124 commented 4 years ago

Hi Sir,

whether or not the dataset is handled by the dataset_converter to the hdf5 files and is also supplied for the research community?

Or else, there is still few researcher who can use the nilmtk and nilmtk-contrib for research. And the phenomenon will continue to exist, "few pubilication presenting algorithmic contributions within the field went on to contribute implementations back to the toolkit".

oneway3124 commented 4 years ago

I have converted three dataset by using the dataset_converters python scripts. The preprocessed dataset by me is as follows, redd.hdf5, ukdale, ampds, But the dataport and some other datasets cannot be preprocessed by us easily. I greatly appreciate all your comments, help and suggestions about the dataset.

The successful train and test of REDD snapshot is as the figure shows. REDD_CO+DAE

aalkhulaifi605 commented 4 years ago

@oneway3124 Can you share the preprocessed redd dataset with me? I am working on a research and I noticed in some work they use redd dataset that preprocessed in some way. I tried to reproduce their results with the redd dataset (that is in nilmtk format) but always fail.

Also, there is a big gap in the data of house 1 main while in that same period there exist data for the aplliances. Similar gaps can be found in the main power in the other houses.

oneway3124 commented 4 years ago

If you want to get the preprocessed redd dataset, I can give you via the email or pan.baidu.com?

Thanks a lot! Best Regards, Wireless Sensor Networks, School of Computer Science, Sichuan University.

Wang Wei(王伟), |mobile: +86-159-0810-6107 | email: wang.david.wei@stu.scu.edu.cn, 190025935@qq.com

PMeira commented 3 years ago

I'll try to look into this but unfortunately we don't own most of the datasets, we can't just rehost them unless they have a clear license (lots of them don't). I opened https://github.com/nilmtk/nilmtk/issues/909 to track and discuss this in 2021.

But the dataport...

See https://github.com/nilmtk/nilmtk/issues/873 (and https://github.com/Pecan-Street/DataPort-Examples/issues/1 -- no replies so far) -- we probably won't support Dataport in the future, at least not fully. Besides a small selection of CSV that is freely available under some terms, it's a commercial service that changes without notice. It's more fair to ask them to support NILMTK instead. NILMTK is a maintained only by volunteers.

It's not hard to find discussions on how copyright issues are a real threat nowadays. I recommend anyone reading on that before hosting someone else's files.

As a final note:

Or else, there is still few researcher who can use the nilmtk and nilmtk-contrib for research. And the phenomenon will continue to exist, "few pubilication presenting algorithmic contributions within the field went on to contribute implementations back to the toolkit".

I disagree that's the main reason. Most people that can't grab a single dataset to work won't be able to contribute significant features unless they're on the topic for the long term. The issue is more of a lack of culture of collaboration, which is something severely lacking in the fields like this. Note that NILMTK did and does have collaboration over the years, but that's a tiny set of individual compared to the whole set of users. But I digress...