ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models
http://ludwig.ai
Apache License 2.0
11.14k stars 1.19k forks source link

Add new datasets to Ludwig Dataset Zoo #2638

Open dantreiman opened 2 years ago

dantreiman commented 2 years ago

Add more datasets to Ludwig Datasets ludwig.datasets, to increase variety and coverage of ML features and tasks in Ludwig.

New datasets should be added to ludwig/datasets/configs, tested, and optionally submitting with an example model config in ludwig/datasets/model_configs

Some candidates:

AV-MNIST: https://github.com/slyviacassell/_MFAS

Sarcastic Headlines: https://github.com/rishabhmisra/News-Headlines-Dataset-For-Sarcasm-Detection

MIMIC-III: https://paperswithcode.com/dataset/mimic-iii

CMU-MOSI: https://github.com/A2Zadeh/CMU-MultimodalSDK

cmenguy commented 1 year ago

@dantreiman @skanjila I was looking at the MIMIC-III dataset but it seems access to this dataset is pretty restrictive and one needs to be a credentialed user as you can see from https://mimic.mit.edu/docs/gettingstarted/

So based on that is there any AWS account associated with Ludwig that could be used to request credentialed access? If not then I am not sure how this could be added unless we assume a given user might be credentialed. Thoughts on how best to pursue?