DownloadManager with register checksum is much slower

Short description When using the DownloadManager to download many small files (1M+ images), if register checksum is disabled, the download seems to go relatively fast. However, if register checksums is enabled, then the download is painfully slow. We are talking about multiple orders of magnitude difference. I'm doing this with a non-beam dataset. I'm unsure if this has something to do with the parallelization of the downloads. The documentation says that if the dl_manager receives a data structure to download it will parallelize it. Does parallelization not work when register checksums is enabled? If this is the case, at the very least it would be nice to update the documentation to clarify this.

Environment information

Operating System: Ubuntu 20.04
Python version: 3.8.5
tensorflow-datasets version: 4.1.0
tensorflow version: 2.3.1
Does the issue still exists with the last tfds-nightly package (pip install --upgrade tfds-nightly) Yes

Reproduction instructions

I'm using dl_manager to download files from S3. So to reproduce this issue, we can try comparing at the speed of downloading multiple small files from S3, once with register_checksums enabled and once disabled. In my case, the size of the dataset is upwards of 70GB, but I don't believe this needs to be the case: a couple GB will probably be enough.

Expected behavior I expected the download speed to not change so drastically due to checksums registration.

tensorflow / datasets

DownloadManager with register checksum is much slower #2901