Open IRDonch opened 2 years ago
Thanks for reporting, indeed: https://github.com/tensorflow/datasets/blob/47baec10e957bffbd52c960cfce1ad31c60e04dd/tensorflow_datasets/core/download/downloader.py#L138 should be updated to use the correct unit. Don't hesitate to send a PR
Short description Running
tfds build <dataset>
when some of the files have already been downloaded and some are missing results in meaningless sizes being displayed in the progress bar.Environment information
Operating System: Ubuntu 18.04
Python version: 3.7.5
tensorflow-datasets
/tfds-nightly
version: tensorflow-datasets 4.5.2tensorflow
/tf-nightly
version: tensorflow 2.7.0Does the issue still exists with the last
tfds-nightly
package (pip install --upgrade tfds-nightly
) ? YesReproduction instructions
First, run
tfds build voc/2012
and wait for it to finish.Then, remove
$TFDS_DATA_DIR/downloads/pjredd.com_media_files_VOCtra_11-May-20124U92MnDPGT0LX3SxafRBV6Swxu-nCPTdD_eO5pF2O8s.tar
and$TFDS_DATA_DIR/voc/2012/4.0.0
.Then run
tfds build voc/2012
again. The progress bar will show something like this:This doesn't make sense, as it implies that the missing file is less than 1% of the total size of the dataset's files, even though it's almost 2 GB in size.
Link to logs N/A
Expected behavior The progress bar should display numbers that are correctly proportioned.
Additional context This most likely happens because the code that reports progress is inconsistent about the units it uses. Here and here it uses bytes, while here it uses megabytes.