intel / dffml

The easiest way to use Machine Learning. Mix and match underlying ML libraries and data set sources. Generate new datasets or modify existing ones with ease.
https://intel.github.io/dffml/main/
MIT License
253 stars 138 forks source link

source: datasets: Add common time-series datasets #1319

Open programmer290399 opened 2 years ago

programmer290399 commented 2 years ago

Pain Point

Currently, we do not have any commonly used time-series datasets available in dffml

Proposed Solution

Write a dataset source (like we have iris dataset) to add the following basic datasets:

Univariate Datasets

johnandersen777 commented 2 years ago

Census data? https://registry.opendata.aws/

TirelessClock commented 2 years ago

Hey, @pdxjohnny @programmer290399 I'd like to work on this issue. Can we use any dataset, or are there any specific requirements (aside from the ones you have written here)? Also, could you tell me what exactly we would be doing with these datasets?

programmer290399 commented 2 years ago

Hey @TirelessClock !!

This issue is a part of a GSoC project for this year, so I am not sure if this is up for grabs or how it would work, you may solve this issue partially, but before proceeding please clear this out with @pdxjohnny.

As far as the question about "which datasets are to be used" is concerned, we definitely want the ones we have listed above, I have linked to their respective sources, also take a look at how datasets work in DFFML, see the link to iris dataset above.

But we are open to any other datasets which are commonly used for benchmarking and research purposes. Before implementing one please make sure you're on the same page with the community members so that you don't end up doing work that we'd not be able to merge into the main branch.

I hope this makes it clear, for any further clarification and queries, join our gitter channel.

TirelessClock commented 2 years ago

Hello @programmer290399 , Thanks for the reply! Yes, I completely understand that it's up for GSoC and it might not necessarily be up for grabs. I would still like to work on the project to solve at least partially, and at present am working my way through the iris dataset. I could use all the help you could give!

mukund2201 commented 2 years ago

Ozone Level Detection Data Set https://archive.ics.uci.edu/ml/datasets/Ozone+Level+Detection