Closed hshreeshail closed 2 years ago
If include_video
is False
, it should not download the videos, which are probably the largest files.
Next largest are the pose files, probably a few 10s of GBs, but I don't know the exact size.
In the desired configuration, after downloading, and sharding, the final size of the database on disk is 146GB.
It could be improved by using float16
instead of float32
, but I do not see it as an issue currently.
Got it. I did not find any ablation studies in your paper that compare results using fewer amount of training data. Given that we are training a single-layer LSTM model with at most ~50k parameters, having 146 GB of training data seems a bit excessive.
How large is the DGS Corpus when downloaded using the
create_tfrecord_dgs_corpus.py
script? When running the script, I am getting the following progress bar: If the numbers here are to be believed, it seems like it will take a very very long time (6+ hours) to download the dataset. Note that the internet speed is not a bottleneck here since I am working on a 150Mbps connection and am getting a 80Mbps download speed.