huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.26k stars 5.41k forks source link

[Examples] Add a training script for HyperDreamBooth #4095

Open sayakpaul opened 1 year ago

sayakpaul commented 1 year ago

Nathaniel Ruiz et al. dropped https://hyperdreambooth.github.io/.

It combines DreamBooth, LoRA, and hypernetworks for a 100kb residual, generated in 20 seconds, that rivals original DreamBooth in model integrity, editability, and subject fidelity.

Seems to be working exceptionally well on faces.

Is anyone interested to pick this one up? Maybe try with SDXL?

killah-t-cell commented 1 year ago

is it a good first issue? Any advice on how to get started?

sayakpaul commented 1 year ago

Thanks for showing interest!

I think you could refer to our examples directory to get a sense of how we develop the scripts. Let us know.

killah-t-cell commented 1 year ago

This is cool! I don't want to promise anything and cookie lick, so I'll let you know if I am interested if I open a PR.

AmericanPresidentJimmyCarter commented 1 year ago

@KohakuBlueleaf is close to a working implementation.

AmericanPresidentJimmyCarter commented 1 year ago

@KohakuBlueleaf 's implementation is working and is in the branch https://github.com/KohakuBlueleaf/LyCORIS/tree/hyperdream

The only thing not currently working is gradient checkpointing. The code for the training of the hypernetwork is in https://github.com/KohakuBlueleaf/LyCORIS/blob/hyperdream/train_hyperdream.py

AmericanPresidentJimmyCarter commented 1 year ago

He has fixed grad checkpointing: https://github.com/KohakuBlueleaf/LyCORIS/commit/9c3aec57a55904c90e072bdf7d6981cb6d556182

So whenever someone is ready they can't port over this code.

madstuntman11 commented 1 year ago

@AmericanPresidentJimmyCarter @KohakuBlueleaf could you please post a sh script with parameters you tried. Thanks!

mosessoh commented 1 year ago

@KohakuBlueleaf Thanks for working on this!

Agreed with @skinenbayev that a sh script would be useful. Reading through the code - am I right to say that train_hyperdream.py trains the hypernetwork, hyperdream_gen_weight.py loads hypernetwork-generated weights into SD, but there's no script that does the fast fine-tuning of SD yet?

KohakuBlueleaf commented 1 year ago

@KohakuBlueleaf Thanks for working on this!

Agreed with @skinenbayev that a sh script would be useful. Reading through the code - am I right to say that train_hyperdream.py trains the hypernetwork, hyperdream_gen_weight.py loads hypernetwork-generated weights into SD, but there's no script that does the fast fine-tuning of SD yet?

I use kohya-ss/sd-script for trainer His train_network.py can load saved lora model to do continuous training

So I never impl that part. The train_hyperdream.py also modified from train_network.py

KohakuBlueleaf commented 1 year ago

And for sh script I use the template in Akegarasu/lora-scripts I can share a example later

sayakpaul commented 1 year ago

Let us know if anyone is interested in contributing this to diffusers examples :-)

Happy to help.

viralparekh commented 1 year ago

@sayakpaul, I am interested if you can help me along the way. (I have used dreambooth, dreambooth_inpaint code a lot, can't wait to try hyperdreambooth.)

KohakuBlueleaf commented 1 year ago

Let us know if anyone is interested in contributing this to diffusers examples :-)

Happy to help.

If you need I can try to implement pure diffusers one But I am not very familiar with diffusers code style

sayakpaul commented 1 year ago

@KohakuBlueleaf sure, to give you a better idea, I'd suggest picking up an example script from here as a reference and basing your implementation from there. Happy to answer any questions that might arise from the initial pass-through.

KohakuBlueleaf commented 1 year ago

@sayakpaul I think the only problem is should I rewrite the patch part and weight generator part in diffusers' style? or? like I impl the weight generator and patch function (to replace the original forward founction)

KohakuBlueleaf commented 1 year ago

and I guess you guys want a complete version of hyperdream booth which require pretrain lora on all the images (like if you have 15k identities, you need to train 15k loras on that at first)

I don't know which kind of implementation you guys want. My impl just ignore the pretrain part (still investigating how to train that faster)

basically we will have 3 stage 1, pretrain loras on all the img-text pair 2, train hypernetwork on the img-text-lora pair 3, finetune the generated lora from target ref img

madstuntman11 commented 1 year ago

@KohakuBlueleaf thank you for your time and efforts! From my perspective it would be ideal to have end-to-end solution (including pretrain part). I'll back with more details. Need to dive deeper in the paper.

sayakpaul commented 1 year ago

Hmm, maybe we could start by adding this to the research_folder (https://github.com/huggingface/diffusers/tree/main/examples/community)?

@patrickvonplaten WDYT?

mosessoh commented 1 year ago

@KohakuBlueleaf i think we can do this step by step. For the pre-trained loras and training of the hypernetwork, happy to help with the compute, if you can give me a script to run. We can use CelebAHQ like the paper did. Do you know why they used 15K rather than the full 30K dataset?

mosessoh commented 1 year ago

On why I think we should do the pretraining of the hypernetwork - its quite core to their result. Sounds like they can do faster fine tuning because loras are already initialised closer to their ideal state.

KohakuBlueleaf commented 1 year ago

On why I think we should do the pretraining of the hypernetwork - its quite core to their result. Sounds like they can do faster fine tuning because loras are already initialised closer to their ideal state.

if you really want to do the pretrained hypernetwork.... I will suggest you to do on more generalized dataset. but it will be way harder than celebA-HQ

And why 15k is bcuz the paper said "15k is enough"

mosessoh commented 1 year ago

happy to hear your suggestion on which other datasets you think are good. Perhaps we can start with CelebA-HQ first to see if everything is working correctly?

KohakuBlueleaf commented 1 year ago

happy to hear your suggestion on which other datasets you think are good. Perhaps we can start with CelebA-HQ first to see if everything is working correctly?

I think that's good So basically we will need 4script

1, batch lightweight lora training script 2, hypernetwork training script 3, lightweight lora finetuning script (you can directly use lora training script if it has resume mechanism) 4, hypernetwork gen weight script

We will need these new modules: LightWeight LoRA (can be manually modified weight from outside, refer to my locon.py in hyperdream branch of lycoris) WeightGenerator (which take ouside context to gen the wright, actually I just use a transformer decoder) HyperDreamBooth (combine ref img encoder, which can be taken from timm, Weight generator, and li-lora modules patched to the target unet or te)

If you guys have style guide or contribution guide I will need to know where to get it thx!

KohakuBlueleaf commented 1 year ago

And for myself I may prefer to make my things into a dedicated repo and make sure I only use diffusers backend and try to make them clear and clean and then let you guys to find out how to merge these archeitectures into diffusers repo

let me know how you guys thought about it

sayakpaul commented 1 year ago

To have flexibility around this, I think it would make sense to start adding to the community folder of examples (as mentioned in https://github.com/huggingface/diffusers/issues/4095#issuecomment-1640001935). This way, you have all the flexibility.

Dedicated repo with a diffusers backend also works for us. That way, it remains self-contained too. That might be the easiest way.

Let us know how we can help.

KohakuBlueleaf commented 1 year ago

@sayakpaul if the community folder for example can have subfolder for some algos with custom modules which haven't been implemented into diffusers I can directly make all the things into 1folder under the community folder

sayakpaul commented 1 year ago

Yes, that also works. Like I mentioned if you feel more comfortable having all of that in a standalone repo with a diffusers backend, that also sounds like a good idea.

KohakuBlueleaf commented 1 year ago

Yes, that also works. Like I mentioned if you feel more comfortable having all of that in a standalone repo with a diffusers backend, that also sounds like a good idea.

I think I will do standalone repo first. We can add the modules adn script after I done it

sayakpaul commented 1 year ago

Happy to collaborate with you on this for any diffusers-related stuff :-)

patrickvonplaten commented 1 year ago

I wouldn't mind at all having a new "hyperdreambooth" folder under the official examples: https://github.com/huggingface/diffusers/tree/main/examples

But for fast iteration a standalone repo could also make sense!

sayakpaul commented 1 year ago

@KohakuBlueleaf while we're at it, I was wondering if it would make sense to:

Let me know if this makes sense?

KohakuBlueleaf commented 1 year ago

@KohakuBlueleaf while we're at it, I was wondering if it would make sense to:

  • Have diffusers as a dependency (always the latest version or source installation) rather than a fork. This way, it's easy for us (as a community) to ensure Diffusers is providing the features needed to build things.
  • Push checkpoints automatically to the Hugging Face Hub. This way we ensure better discovery of the checkpoints and it also helps with easy distribution. For example, people would easily load those checkpoints with HF Hub APIs.

Let me know if this makes sense?

very make sense, will ensure these 2 in later works.

And here is the repository I will put all the things in https://github.com/KohakuBlueleaf/HyperKohaku

sayakpaul commented 1 year ago

Thanks much! We could also work on modules that easily lets people load the trained checkpoints into diffusers.

gouravmohanty7070 commented 1 year ago

Hey @sayakpaul if the issue is still open. I would love to work on it!

KohakuBlueleaf commented 1 year ago

Hey @sayakpaul if the issue is still open. I would love to work on it!

How about we collab with each other? I really need to figure out how to pretrain tons of lilora efficiently within 1 script. (every img-text pair in the dreambooth dataset need 1 lilora for each of them.)

sayakpaul commented 1 year ago

@KohakuBlueleaf we're happy to provide GPU support if needed. Let us know.

markojak commented 1 year ago

Is anyone working on this still?

KohakuBlueleaf commented 1 year ago

Is anyone working on this still?

I have made a simple example for pre-optimized weight and HyperNetwork

And it can train but cannot get reasonable result, so i'm investigating

dhivyeshrk commented 1 year ago

Is anyone working on this issue? Otherwise, I would be happy to contribute.

KohakuBlueleaf commented 1 year ago

Is anyone working on this issue? Otherwise, I would be happy to contribute.

Can you check my hyperkohaku repo to see if there's any think to add/refine/fix? it should be able to train and use but something may be wrong and it is hard to produce usable things