Open INF800 opened 2 years ago
Can I use instructions present in below link for time series dataset as well? https://github.com/huggingface/datasets/blob/master/ADD_NEW_DATASET.md
cc'ing @kashif and @NielsRogge for visibility!
@INF800 happy to add this dataset! I will try to set a PR by the end of the day... if you can kindly point me to the dataset? Also, note we have a bunch of time series datasets checked in e.g. electricity_load_diagrams
or monash_tsf
, and ideally this dataset could also be in a similar format.
Thankyou. This is how raw data looks like before cleaning for an individual stocks:
Scraping is automated using GitHub Actions. So, everyday we will see a new file added in the above links.
I can rewrite the cleaning scripts to make sure it fits HF dataset standards. (P.S I am very much new to HF dataset)
The data set above can be converted into univariate regression / multivariate regression / sequence to sequence generation dataset etc. So, do we have some kind of transformation modules that will read the dataset as some type of dataset (GenericTimeData
) and convert it to other possible dataset relating to a specific ML task. By having this kind of transformation module, I only have to add data once and use transformation module whenever necessary
Additionally, having some kind of versioning for the dataset will be really helpful because it will keep on updating - especially time series datasets
thanks @INF800 I'll have a look. I believe it should be possible to incorporate this into the time-series format.
@INF800 yes I am aware of the review repository and paper which is more or less a collection of abstracts etc. I am working on a unified library of implementations of these papers together with datasets to be then able to compare/contrast and build upon the research etc. but I am not ready to share them publicly just yet.
In any case regarding your dataset at the moment its seems from looking at the csv files, its mixture of textual and numerical data, sometimes in the same column etc. As you know, for time series models we would need just numeric data so I would need your help in disambiguating the dataset you have collected and also perhaps starting with just numerical data to start with...
Do you think you can make a version with just numerical data?
@INF800 yes I am aware of the review repository and paper which is more or less a collection of abstracts etc. I am working on a unified library of implementations of these papers together with datasets to be then able to compare/contrast and build upon the research etc. but I am not ready to share them publicly just yet.
In any case regarding your dataset at the moment its seems from looking at the csv files, its mixture of textual and numerical data, sometimes in the same column etc. As you know, for time series models we would need just numeric data so I would need your help in disambiguating the dataset you have collected and also perhaps starting with just numerical data to start with...
Do you think you can make a version with just numerical data?
Will share the numeric data and conversion script within end of this week.
I am on a business trip currently - it is in my desktop.
thanks @INF800 kashif.rasul@gmail.com should work
It should be in your inbox!
On Sun, 21 Jul, 2024, 9:44 pm Kashif Rasul, @.***> wrote:
thanks @INF800 https://github.com/INF800 @.*** should work
— Reply to this email directly, view it on GitHub https://github.com/huggingface/datasets/issues/4104#issuecomment-2241701256, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK4GSXLHCOGNTU5ERJ6M3ITZNPM6TAVCNFSM6AAAAABLG65FLKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBRG4YDCMRVGY . You are receiving this because you were mentioned.Message ID: @.***>
Adding a Time Series Dataset