Closed cyfra closed 4 years ago
Some of the datasets require manual downloading of files. This should be clearly marked in the dataset documentation.
Preferably it should be detected 'automatically' - by seeing which datasets depend on manual_dir.
I want to work on this issue. Can you please guide how to go ahead with it?
@ShambhaviCodes Thanks for looking into this.
manual dir is accessed through: https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/core/download/download_manager.py#L372
After download_and_prepare, the dataset builder should looks if this field has been called, indicating that the dataset is using manual data: https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/core/dataset_builder.py#L285
Then this information should be saved inside dataset_info (is some new field self.info.use_manual_data
), so the DatasetInfo class should be updated: https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/core/dataset_info.py
Note that you also want to update the associated proto: https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/core/proto
Finally, the dataset template should be updated to use this new builder.info.use_manual_data
field: https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/scripts/templates/dataset.mako.md
Let me know if you encounter any issues.
I was trying to re-create this issue. I have successfully managed to clone, extract and run the code to reproduce the (mnist) dataset. Can you suggest a dataset that requires manual downloading of files?
You can find which datasets are using manual_dir by searching on the code: https://github.com/tensorflow/datasets/search?p=2&q=dl_manager.manual_dir&unscoped_q=dl_manager.manual_dir
For instance: abstract_reasoning
, chexpert
, xsum
,...
Fixed in #1227
Some of the datasets require manual downloading of files. This should be clearly marked in the dataset documentation.
Preferably it should be detected 'automatically' - by seeing which datasets depend on manual_dir.