threestudio-project / threestudio

A unified framework for 3D content generation.
Apache License 2.0
6.32k stars 480 forks source link

Is it possible to use pretrained model fine tuned with LORA? #159

Open krNeko9t opened 1 year ago

krNeko9t commented 1 year ago

Same as title, what if i want to use a model fine tuned by LORA to generate reference image? Does the paper support things like that?

thuliu-yt16 commented 1 year ago

Not quite sure about what you mean. Do you mean to use the lora-tuned model to generate images in prolificdreamer?

krNeko9t commented 1 year ago

Not quite sure about what you mean. Do you mean to use the lora-tuned model to generate images in prolificdreamer?

Yes. most of the methods rely on a frozen T2I model. So I wonder if we can use a version fine tuned by LORA? Thanks for reply.

krNeko9t commented 1 year ago

Not quite sure about what you mean. Do you mean to use the lora-tuned model to generate images in prolificdreamer?

Yes. most of the methods rely on a frozen T2I model. So I wonder if we can use a version fine tuned by LORA? Thanks for reply.

to be specific, the model used in prolificdreamer referenced by path system.guidance.pretrained_model_name_or_path

thuliu-yt16 commented 1 year ago

Let me clarify. There are actually two different things related to what I just said. Maybe I am talking about one of them and you are referring to another.

  1. Just run the prolificdreamer pipeline without any change. And from the pipeline, we will get a lora-tuned model trained on the specific prompt and conditioned on camera pose. And we sample images from this model using t2i schedulers such as DPM-Solver. This is supported in threestudio, just add system.visualize_samples=True.

  2. In prolificdreamer, we replace the model that should be trained with lora in optimization with a model that has already been fine-tuned with lora. So in this case, I guess if the model is completely frozen, it should not work because the model should give an estimation of the distribution of the current UNDER-OPTIMIZED rendered image rather than NEARLY PERFECT rendered image. If the model is still trained during the pipeline, such as loading some weights from ControlNet and continuing to train with vsd, I guess it could work to some extent but I am not very sure how the input should be. In this case, we may need a function to load lora weights and it is very easy to implement.

krNeko9t commented 1 year ago
16873284843760

In the paper's illustration, there is two model:T2I and LORA, so i guess it refers to the members in the StableDiffusionVSDGuidance class: pipe and pipe_lora? (forgive me for not reading it carefully)

So due to your description, the T2I model has already been fine-tuned with lora? But what does it fine-tuned for? There are three part of weights? the T2I model, the pretrained LORA, the optimization target LORA?

Or the pipe + pipe_lora is the T2I system mentioned in the paper? But the pipe_lora refers to a hugging face full model?

I'm a little confused, and new to this project. If I need to take more time to read the paper or the project, please tell me. Thanks for your help!

DSaurus commented 1 year ago

Hi, @krNeko9t.

In the StableDiffusionVSDGuidance, the "pipe" represents the frozen T2I base model, while "pipe_lora" represents the frozen T2I model with an additional unfrozen LORA1(for 3D generation). So if you want to utilize a pre-trained model fine-tuned with LORA, your base model would become T2I + LORA2(your lora model). It's important to notice that LORA1(for 3D generation) and LORA2 (your lora model) are completely distinct. However, currently, threestudio only supports a single T2I model without any other additional modules as the base model. Therefore, you'll need to implement some code to enable support for your T2I + LORA2. One possible solution, as suggested by @thuliu-yt16, is as follows:

krNeko9t commented 1 year ago

Can i ask another question: why there is two sd model used in prolific system? the origin paper seems dosen't mention this. what's the benefit of this strategy?

thuliu-yt16 commented 1 year ago

The SD model 2-1-base/2-1 is in eps-prediction/v-prediction mode. The authors said that using a v-prediction model for lora works better. You can definitely try both. I think there is no big difference.