pesser / stable-diffusion

MIT License
1.03k stars 397 forks source link

Won't run out of the box #1

Open joetm opened 2 years ago

joetm commented 2 years ago

Missing logs/f8-kl-clip-encoder-256x256-run1/configs/2022-06-01T22-11-40-project.yaml

benedlore commented 2 years ago

What exactly is the difference between this and the Latent Diffusion here (https://github.com/CompVis/latent-diffusion) with the same readme?

JonnoFTW commented 2 years ago

I also had this issue, after downgrading torchmetrics to 0.6.0 (see https://github.com/NVIDIA/DeepLearningExamples/issues/1113) and applying the patch from #4

I get an ImportError:

ImportError: cannot import name 'CLIPTokenizer' from 'transformers' (unknown location)

Edit:

Upgrading transformers to 4.20.1 fixed the issue, but then there's an issue with openssl. I copied pycrypto.so and libssl.so.3 from another conda env I had but this is a bandaid fix.

fjenett commented 2 years ago

You can easily fix this by adding the large text2img to the params:

python scripts/txt2img.py \
    --prompt "a virus monster is playing guitar, oil on canvas" \
    --config configs/latent-diffusion/txt2img-1p4B-eval.yaml \
    --ckpt models/ldm/text2img-large/model.ckpt \
    --ddim_eta 0.0 --n_samples 4 --n_iter 4 --scale 5.0  --ddim_steps 50
neonsecret commented 2 years ago

You can easily fix this by adding the large text2img to the params:

python scripts/txt2img.py \
    --prompt "a virus monster is playing guitar, oil on canvas" \
    --config configs/latent-diffusion/txt2img-1p4B-eval.yaml \
    --ckpt models/ldm/text2img-large/model.ckpt \
    --ddim_eta 0.0 --n_samples 4 --n_iter 4 --scale 5.0  --ddim_steps 50

same issue from the box, solution worked for me

faraday commented 2 years ago

You can easily fix this by adding the large text2img to the params:

python scripts/txt2img.py \
    --prompt "a virus monster is playing guitar, oil on canvas" \
    --config configs/latent-diffusion/txt2img-1p4B-eval.yaml \
    --ckpt models/ldm/text2img-large/model.ckpt \
    --ddim_eta 0.0 --n_samples 4 --n_iter 4 --scale 5.0  --ddim_steps 50

This doesn't return same results though. You'd immediately notice the quality has decreased. I don't think it's the same model here.