Dafter is a command line downloader of public datasets. It takes care of downloading and formatting the datasets' files so that you can spend hours building models instead of looking for datasets and their urls.
To install dafter, just do:
pip install dafter
To download the MNIST dataset:
dafter get mnist
To delete MNIST from your machine:
dafter delete mnist
To search among downloadable datasets:
# Search all available datasets
dafter search
# Search all available datasets that have the tags "image" and "deep-learning"
# and whose name contains "mni"
dafter search mni --tags image deep-learning
To list all the datasets that have been downloaded and are stored on your machine:
# Lists all datasets in database
dafter list
# Lists all datasets in database that have the tag "twitter" and whose name
# contains "sentiment"
dafter list sentiment --tags twitter
To update dafter
, do:
pip install --upgrade dafter
To uninstall dafter
, do:
pip uninstall dafter
To add a new dataset, just add a json
file called name-of-the-dataset.json
in the datasets-configs
folder.
{
"name": "name-of-the-dataset",
"urls": [
{
"url": "https://site.com/file1.tar.gz",
"bytes": 45221
},
{
"url": "https://site.com/file2.tar.gz",
"bytes": 1147803
}
],
"type": "tar.gz",
"tags": ["tag1", "tag2", "tag3"],
"description": "This is a description of the dataset",
"source": "https://site.com/"
}