mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.62k stars 561 forks source link

Stable Diffusion Dataset #751

Closed amasin2111 closed 3 months ago

amasin2111 commented 5 months ago

When we were trying to download the data set using the script laion400m-filtered-download-images.sh, we were getting an error that the source directory doesn't exist. Specifically, below command is failing, "rclone copy mlc-training:mlcommons-training-wg-public/stable_diffusion/datasets/laion-400m/moments-webdataset-filtered/ ${OUTPUT_DIR} --include="*.tar" -P"

amasin2111 commented 4 months ago

Hi, were you able to get around it

morphine00 commented 4 months ago

@ahmadki both me and @nathanw-mlc tested the rclone commands and the scripts, and the data exists in the bucket (see attached)

What I did notice is that even the original scrips assumes that the destination directory /datasets/etcetc can be created, but unless the user is root, they won't have permissions to do so. Maybe this is the reason why it fails?

rclone-1 rclone-2

amasin2111 commented 4 months ago

Hi, I have used the same commands but still observing the same issue

1 2
nathanw-mlc commented 4 months ago

Can those having issues please share the result of rclone version.

amasin2111 commented 4 months ago

Here is the version

version
nathanw-mlc commented 4 months ago

I just noticed that the update to the Dockerfile uses apt-get install to install Rclone. This install method installs an old version of Rclone (rclone v1.53.3-DEV) that doesn't process the rclone config create command correctly, resulting in Rclone attempting to connect to an AWS S3 bucket with the provided credentials, rather than a Cloudflare R2 bucket. Users need to be running v1.6x.x. To make that happen, the Dockerfile should install Rclone with the install command we provide for all Rclone instructions: sudo -v ; curl https://rclone.org/install.sh | sudo bash

amasin2111 commented 4 months ago

It worked for me, other users might have to clean the config files, before retrying with new rclone version

hiwotadese commented 3 months ago

@ahmadki can we fix Dockerfile with @nathanw-mlc sugguestion sudo -v ; curl https://rclone.org/install.sh | sudo bash?

ahmadki commented 3 months ago

I genuinely dislike piping scripts from the internet into bash. Not only does it pose a security risk, but we also need to freeze rclone to a specific version.

https://github.com/mlcommons/training/pull/757 should work better.

ShriyaPalsamudram commented 3 months ago

Closing because #757 is merged.