ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models
http://ludwig.ai
Apache License 2.0
11.08k stars 1.19k forks source link

Add Kaggle tabular dataset and multiscale benchmarksfor multimodal learning #2657

Open skanjila opened 1 year ago

skanjila commented 1 year ago

Is your feature request related to a problem? Please describe. Want to add the following datasets: two sources to mine for good datasets: Kaggle Tabular Playground Challenge - a monthly tabular data challenge: https://www.kaggle.com/search?q=tabular+playground+series MultiBench: Multiscale Benchmarks for Multimodal Representation Learning

Describe the use case We need to add more datasets to ludwig for general testing

Describe the solution you'd like Same behavior to upload and download dataset APIs

Describe alternatives you've considered There are lots of datasets to add so the alternative would be adding similar datasets

Additional context N/A

dalianaliu commented 1 year ago

Hi @skanjila, please go ahead and summit a PR. Let us know if you need any help.

I'm curious if you could share more about your use case?

skanjila commented 1 year ago

@dalianaliu I'm a committer on Ludwig and actually started the datasets initiative with @tgaddair and @w4nderlust , this idea was actually proposed by @dantreiman , I am just working on the implementation of this till the model-hub starts getting incorporated