huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.03k stars 5.17k forks source link

Can not run LCM distill pipeline, due to dataset access #5770

Closed lonestar1990 closed 9 months ago

lonestar1990 commented 9 months ago

Describe the bug

Can not run LCM distillation example script due to aws dataset access

Reproduction

runwayml/stable-diffusion-v1-5 PROGRAM="train_lcm_distill_lora_sd_wds.py \ --pretrained_teacher_model=$MODEL_DIR \ --output_dir=$OUTPUT_DIR \ --mixed_precision=fp16 \ --resolution=512 \ --lora_rank=64 \ --learning_rate=1e-6 --loss_type="huber" --adam_weight_decay=0.0 \ --max_train_steps=1000 \ --max_train_samples=4000000 \ --dataloader_num_workers=8 \ --train_shards_path_or_url='pipe:aws s3 cp s3://muse-datasets/laion-aesthetic6plus-min512-data/{00000..01210}.tar -' \ --validation_steps=200 \ --checkpointing_steps=200 --checkpoints_total_limit=10 \ --train_batch_size=12 \ --gradient_checkpointing --enable_xformers_memory_efficient_attention \ --gradient_accumulation_steps=1 \ --use_8bit_adam \ --resume_from_checkpoint=latest \ --report_to=wandb \ --seed=453645634 \ --push_to_hub \

Logs

miniconda3/lib/python3.11/site-packages/webdataset/handlers.py:33: UserWarning: OSError("
(('aws s3 cp s3://muse-datasets/laion-aesthetic6plus-min512-data/00627.tar -',), {'shell': True, 'bufsi
ze': 8192}): exit 1 (read) {}", <webdataset.gopen.Pipe object at 0x7fa585ffd6d0>, 'pipe:aws s3 cp s3://
muse-datasets/laion-aesthetic6plus-min512-data/00627.tar -')
  warnings.warn(repr(exn))

System Info

Ubuntu

Who can help?

@patil-suraj I saw your commit in the script, wonder if you have any idea about the access issue. Or is there a quick workaround to unblock me from running the example? Thank you!

kilimchoi commented 9 months ago

This is due to aws: not found error message. Try preparing your own tar file and use that url.

pcuenca commented 9 months ago

5908 shows an example with a public dataset.