Open WarrenSchultz opened 2 days ago
DLRM docker container needs criteo dataset to be preprocessed outside of it. We need to add this option in the documentation page but if you have the preprocessed data we can tell you how to use it.
@anandhu-eng we can sync on how to add this option in the documentation page.
Huh, ok. I thought I saw it pulling down the full dataset, but I may have been mistaken. I'm working on a lot in parallel at the moment. :) What's the correct command to do so at this point through CM?
Currently we only support plugging in the preprocessed data as the download of criteo stopped working without manual intervention. I believe we can share you the preprocessed data - doing preprocessing is heavy - needs 6.4 TB disk space and 600 GB+ of memory and around 3 days of running. The preprocessed data is less than 300 GB. We can share it by end of this week - needs to test it for expected accuracy.
Great, thank you.
Tried running both the command to run it via a docker container, and also running it within the ResNet50 container.
End of the log follows