The extract_compressed_file and force_extract properties of DownloadConfig are always being set to True in the function dataset_module_factory in the load.py file. This behavior is very annoying because data extracted will just be ignored the next time the dataset is loaded.
See this image below:
Steps to reproduce the bug
Have a local dataset that contains archived files (zip, tar.gz, etc)
Build a dataset loading script to download and extract these files
Run the load_dataset function with a DownloadConfig that specifically set force_extract to False
The extraction process will start no matter if the archives was extracted previously
Expected behavior
The extraction process should not run when the archives were previously extracted and force_extract is set to False.
Describe the bug
The
extract_compressed_file
andforce_extract
properties of DownloadConfig are always being set to True in the functiondataset_module_factory
in theload.py
file. This behavior is very annoying because data extracted will just be ignored the next time the dataset is loaded.See this image below:
Steps to reproduce the bug
force_extract
to FalseExpected behavior
The extraction process should not run when the archives were previously extracted and
force_extract
is set to False.Environment info
datasets==2.20.0 python3.9