mlfoundations / open-diffusion

Simple large-scale training of stable diffusion with multi-node support.
126 stars 9 forks source link

Offline use #2

Closed mehdidc closed 1 year ago

mehdidc commented 1 year ago

It would be nice to be able to avoid the need of an internet connection as it is usually blocked in supercomputers. Two things I observed that needs internet connection:

which happens at each validation/generation step, although it did not have any consequence, the job did not hang or anything, but I did not find a way to get rid of the message.

vkramanuj commented 1 year ago

Thanks for pointing this out. I believe this is due to this line https://github.com/mlfoundations/open-diffusion/blob/a06305ae548e1dab7a30cc11862632b255a06e6d/train.py#L642. In it, we reconstruct the scheduler for the save pipeline using the huggingface link, which causes the code to look on HF for its config. You might be able to fix it if you point it to a local copy (e.g. config.model.pretrained or config.model.scheduler.pretrained, whichever exists). Let me know if this fixes the issue!

mehdidc commented 1 year ago

Ah I see, thanks, will give it a try

mehdidc commented 1 year ago

@vkramanuj I was thinking, can't we just pass the noise_scheduler (https://github.com/mlfoundations/open-diffusion/blob/main/train.py#L237) that is already instantiated to the save function and use it ? worked fine for me, but I want to be sure that it won't cause an issue. If it is fine, I can open a PR.

vkramanuj commented 1 year ago

@mehdidc One concern here is that people generally use different schedulers at train and inference time. Certain schedulers (e.g. PNDM https://arxiv.org/pdf/2202.09778.pdf and Euler Discrete https://arxiv.org/abs/2206.00364) promise better generations with fewer timesteps. Unclear if this is important for basic validation though, but this is how people generate in practice.

mehdidc commented 1 year ago

Oh I see, makes sense then, thanks for clarifying. Ok so the scheduler for inference could be potentially parametrized in the config as well eventually, safety checker as well.

mehdidc commented 1 year ago

So I think, to avoid the confusion in the config file, I would say we should separate the two, e.g. training_scheduler and inference_scheduler

vkramanuj commented 1 year ago

That sounds good to me. Seems like this solves the issue, I will submit a PR to that effect (although feel free to submit a PR if you've already done this in a local copy).

mehdidc commented 1 year ago

Cool, I do have a local branch, let me quickly finish it