Closed coppock closed 4 months ago
Observing same issue
The issue originated after merging: https://github.com/mlcommons/training/pull/712
The dataset was being downloaded from MLC S3 bucket directly using wget, the PR changed the download method to rclone+cloudflare. rclone is not installed in the docker image so I added it in: https://github.com/mlcommons/training/pull/752
Even if download the rclone separately, then use the script laion400m-filtered-download-images.sh, we were getting an error that the source directory doesn't exist. Specifically below command is giving this error rclone copy mlc-training:mlcommons-training-wg-public/stable_diffusion/datasets/laion-400m/moments-webdataset-filtered/ ${OUTPUT_DIR} --include="*.tar" -P"
I just saw https://github.com/mlcommons/training/issues/751, I'll look into at and solve the issue ASAP.
After building the Docker image provided in stable_diffusion, the first data download command fails as follows: