Closed db3000 closed 2 years ago
As long as the model isn't using hypernetworks, you can import it into InvokeAI using the !import
command on the CLI. See https://github.com/invoke-ai/InvokeAI/blob/main/docs/features/CLI.md#model-selection-and-importation
I'll give it a try myself later today in the event that more is needed than simply bringing in a new checkpoint file.
It is not using hypernetworks, but it needs some changes to the samplers and generation code as the model has extra channels which need to be initialized to the init image and mask.
Just porting in the model and config results in an error about not being able to find the LatentInpaintDiffusion
class, when I brought that in and some other changes generation just silently fails
For code changes from Runway ML's repo see: https://github.com/runwayml/stable-diffusion/compare/69ae4b3...main
There are some other implementations of this brewing in other forks that might be useful reference too.
@db3000 On a related note, I'm confused about https://huggingface.co/runwayml/stable-diffusion-v1-5 I'm assuming it is his own version (?) and labeled it 1.5 to be more catchy? I also see he has the stable-diffusion-inpainting ckpt.
Ok, looks like a little bit of research and coding is in order. Thanks for taking it as far as you did. If you could summarize what you understand about the necessary changes that would be very helpful.
Lincoln
On Thu, Oct 20, 2022 at 8:08 AM Danny Beer @.***> wrote:
For code changes from Runway ML's repo see: runwayml/stable-diffusion@ 69ae4b3...main https://github.com/runwayml/stable-diffusion/compare/69ae4b3...main
There are some other implementations of this brewing in other forks that might be useful reference too.
— Reply to this email directly, view it on GitHub https://github.com/invoke-ai/InvokeAI/issues/1184#issuecomment-1285421284, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA3EVIAGJFC3VZAE5AAYODWEEY2BANCNFSM6AAAAAARKA6YRY . You are receiving this because you were assigned.Message ID: @.***>
--
Lincoln Stein
Head, Adaptive Oncology, OICR
Senior Principal Investigator, OICR
Professor, Department of Molecular Genetics, University of Toronto
Tel: 416-673-8514
Cell: 416-817-8240
@.***
*E*xecutive Assistant
Michelle Xin
Tel: 647-260-7927
@. @.>*
Ontario Institute for Cancer Research
MaRS Centre, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
Collaborate. Translate. Change lives.
This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
Apparently this is the bona fide Stable Diffusion 1.5 model.
On Thu, Oct 20, 2022 at 2:01 PM Lincoln Stein @.***> wrote:
Ok, looks like a little bit of research and coding is in order. Thanks for taking it as far as you did. If you could summarize what you understand about the necessary changes that would be very helpful.
Lincoln
On Thu, Oct 20, 2022 at 8:08 AM Danny Beer @.***> wrote:
For code changes from Runway ML's repo see: runwayml/stable-diffusion@ 69ae4b3...main https://github.com/runwayml/stable-diffusion/compare/69ae4b3...main
There are some other implementations of this brewing in other forks that might be useful reference too.
— Reply to this email directly, view it on GitHub https://github.com/invoke-ai/InvokeAI/issues/1184#issuecomment-1285421284, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA3EVIAGJFC3VZAE5AAYODWEEY2BANCNFSM6AAAAAARKA6YRY . You are receiving this because you were assigned.Message ID: @.***>
--
Lincoln Stein
Head, Adaptive Oncology, OICR
Senior Principal Investigator, OICR
Professor, Department of Molecular Genetics, University of Toronto
Tel: 416-673-8514
Cell: 416-817-8240
@.***
*E*xecutive Assistant
Michelle Xin
Tel: 647-260-7927
@. @.>*
Ontario Institute for Cancer Research
MaRS Centre, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
Collaborate. Translate. Change lives.
This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
--
Lincoln Stein
Head, Adaptive Oncology, OICR
Senior Principal Investigator, OICR
Professor, Department of Molecular Genetics, University of Toronto
Tel: 416-673-8514
Cell: 416-817-8240
@.***
*E*xecutive Assistant
Michelle Xin
Tel: 647-260-7927
@. @.>*
Ontario Institute for Cancer Research
MaRS Centre, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
Collaborate. Translate. Change lives.
This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
https://huggingface.co/runwayml/stable-diffusion-v1-5/discussions/1 It's still being investigated, but at least it seems to be the true v1.5
We confirm there has been no breach of IP as flagged and we thank Stability AI for the compute donation to retrain the original model.
I downloaded and installed the 1.5 model that Hugging Face posted and it works quite well. However, it looks like there is some dispute going on between Stability AI and RunwayML, and it isn't at all clear that RunwayML's 1.5 is the same as StabilityAI's 1.5.
It apparently is the real deal? https://discord.com/channels/1020123559063990373/1020123559831539744/1032755319803215954
Seems like https://huggingface.co/runwayml/stable-diffusion-v1-5/discussions/1 is closed now and Stability AI has withdrawn their takedown request.
Note (for general context) that the model being discussed there is separate from the inpainting model that this ticket is for. That can be found here: https://huggingface.co/runwayml/stable-diffusion-inpainting
Preliminary results, but inpainting v1.5
and v1.5
seem to produce the same results for prompt2img inference (?) so maybe having both models around is redundant (?) -if inpainting model can do the best of both worlds.
Update: Also, using inpainting gives me the same result for both models. And both are 4.27GB. Are they the same model? Supposedly, there are not meant to be the same, right?
stable-diffusion-v1-5 Resumed from stable-diffusion-v1-2 - 595,000 steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10 % dropping of the text-conditioning to improve classifier-free guidance sampling.
stable-diffusion-inpainting Resumed from stable-diffusion-v1-5 - then 440,000 steps of inpainting training at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning. For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and in 25% mask everything.
Maybe we need to run that specific code @db3000 mentioned to see the different behavior between both models.
Maybe we need to run that specific code @db3000 mentioned to see the different behavior between both models.
Right I think you need to initialize the model properly to take advantage of the inpainting.
How did you run against the inpainting model though? It crashes for me when I try.
They seem to be different files with different checksums and different dimension weights inside (I don't know the exact significance of model.diffusion_model.input_blocks.0.0.weight
, but that is what invoke.py complains about having the wrong size if you load the inpainting model with the normal config):
$ sum sd-v1-5-inpainting.ckpt
15650 4165467 sd-v1-5-inpainting.ckpt
$ python3.9 -c "import torch; print(torch.load('sd-v1-5-inpainting.ckpt')['state_dict']['model.diffusion_model.input_blocks.0.0.weight'].size())"
torch.Size([320, 9, 3, 3])
$ sum v1-5-pruned-emaonly.ckpt
40530 4165411 v1-5-pruned-emaonly.ckpt
$ python3.9 -c "import torch; print(torch.load('v1-5-pruned-emaonly.ckpt')['state_dict']['model.diffusion_model.input_blocks.0.0.weight'].size())"
torch.Size([320, 4, 3, 3])
This is the error I get if I try to load sd-v1-5-inpainting.ckpt:
** model stable-diffusion-inpainting could not be loaded: Error(s) in loading state_dict for LatentDiffusion:
size mismatch for model.diffusion_model.input_blocks.0.0.weight: copying a param with shape torch.Size([320, 9, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3]).
You're absolutely right. I was so tired I didn't even notice the message at the end :)
Caching model stable-diffusion-1.5 in system RAM Loading stable-diffusion-1.5-inpaint from models/ldm/stable-diffusion-v1/sd-v1-5-inpainting.ckpt | LatentDiffusion: Running in eps-prediction mode | DiffusionWrapper has 859.52 M params. | Making attention of type 'vanilla' with 512 in_channels | Working with z of shape (1, 4, 32, 32) = 4096 dimensions. | Making attention of type 'vanilla' with 512 in_channels ** model stable-diffusion-1.5-inpaint could not be loaded: Error(s) in loading state_dict for LatentDiffusion: size mismatch for model.diffusion_model.input_blocks.0.0.weight: copying a param with shape torch.Size([320, 9, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3]).
No wonder it was the same result. It was defaulting back to 1.5.
I'll try to see how to run torch.Size([320, 9, 3, 3])
Hah, no worries. I just wanted to make sure I wasn't doing something wrong.
So if you use the RunwayML yaml config file with the model then it clears that error, but complains about missing LatentInpaintDiffusion
class. And if you paste in that class from the RunwayML codebase to ddpm.py then generation just silently fails, seems like a bunch of other changes are needed.
Yes, I am exactly at that point (silently failing). I tried porting changes from https://github.com/runwayml/stable-diffusion/commits/main (e.g. some updated ddim.py
code was in ddim.py
and sampler.py
, so I updated the relevant pieces of both files) and then had to add personalization_config
to configs/stable-diffusion/v1-inpainting-inference.yaml
The models seemingly loads
invoke> !switch stable-diffusion-1.5-inpaint
>> Caching model stable-diffusion-1.4 in system RAM
>> Loading stable-diffusion-1.5-inpaint from models/ldm/stable-diffusion-v1/sd-v1-5-inpainting.ckpt
| LatentInpaintDiffusion: Running in eps-prediction mode
| DiffusionWrapper has 859.54 M params.
Keeping EMAs of 688.
| Making attention of type 'vanilla' with 512 in_channels
| Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
| Making attention of type 'vanilla' with 512 in_channels
| Using more accurate float32 precision
>> Model loaded in 6.63s
>> Setting Sampler to k_lms
But fails when used
invoke> "3d pixar dog, disney dog, 3d render dog, unreal engine dog, 3d movie dog character" -s 50 -S 3479165689 -W 512 -H 512 -C 7.5 -I toby.png -A k_lms -f 0.75
>> loaded input image of size 576x576 from toby.png
>> Initial image has transparent areas. Will inpaint in these regions.
>> This input is larger than your defaults. If you run out of memory, please use a smaller image.
>> Using recommended DDIM sampler for inpainting.
>> target t_enc is 37 steps
invoke>
I also tried the streamlit option (using this repo).
streamlit run scripts/inpaint_st.py -- configs/stable-diffusion/v1-inpainting-inference.yaml models/ldm/stable-diffusion-v1/sd-v1-5-inpainting.ckpt
but that has errors of its own
KeyError: '27 is not registered'
Okay, so in our repository it's failing in ldm/invoke/generator/inpaint.py
-at least for me.
In his own version, inpaint_st.py
calls def inpaint(sampler, image, mask, prompt, seed, scale, ddim_steps, num_samples=1, w=512, h=512):
so maybe we can port this function to inpaint.py
I think the RunwayML version also doesn't work with all the samplers, some implementations that claim to though are:
Okay, I think I know where it fails (?)
Our inpainting has code in ldm/invoke/generator/inpaint.py
but also ldm/invoke/generator/base.py
It is in this last file where it tries to run:
with scope(self.model.device.type), self.model.ema_scope():
and fails.
I assume it might have to do with ema scope, since when loading the 1.5-inpainting model I see
Keeping EMAs of 688.
If I change to with scope(self.model.device.type):
it gets further than before
xc = torch.cat([x] + c_concat, dim=1)
TypeError: can only concatenate list (not "NoneType") to list
which comes from ldm/models/diffusion/ddpm.py
because the key is hybrid
, set in v1-inpainting-inference.yaml
conditioning_key: hybrid # important
In contrast to our regular inference, which uses self.conditioning_key == 'crossattn'
Back after a break, I may have been able to run it. Source and mask:
Result:
There's a few question marks, though, so I'll leave it for @lstein to do properly (I don't even know if I did it right).
What I did (apart from what I mentioned in https://github.com/invoke-ai/InvokeAI/issues/1184#issuecomment-1286884623) was to run python3 scripts/inpaint_st.py
removing the streamlit parts and hardcoding the values, e.g.
sampler = initialize_model('configs/stable-diffusion/v1-inpainting-inference.yaml', 'models/ldm/stable-diffusion-v1/sd-v1-5-inpainting.ckpt')
image = Image.open("test4.png")
seed = 1
num_samples = 1
scale = 7.5
ddim_steps = 50
etc. and in the end, saving the result
result[0].save('outputs/new/test.png', 'PNG')
PS: Not sure how the mask works. Do I use the same image with transparent parts for the mask? I may try later, to see if I can set a proper mask.
A bit better, changing only the face. (prompt 'dog face')
Also, with prompt 'shiba inu':
@db3000 This is the code, if you want to give it a look / try. inpaint_st.py.zip My mask is the only weird thing, where everything is transparent except the part to inpaint (but I think we could invert it in the code to be more intuitive) test8.png.zip
A few more results:
Yep, it's more intuitive like this. inpaint_st.py.zip Initial
Mask (note it's a screenshot of it)
Prompt taylor swift Result
I have a Mac so I can't really compare with cuda results other people put out there, so not sure if I am running this correctly. I'll wait for some other people to run this : ) But at least it looks decent!
Thanks! I will try this out later
Based on the description. It seems that 1.5 inpainting model would behave the same as the standard 1.5 (if not using inpainting feature).
This is very cool indeed. Based on this it will not be hard at all to port it over to InvokeAI, and the code will get much simpler because all the masking work is already done in the model. I also notice that there is cross-attention built into this model.
I'm a little confused about where you are running the inpaint_st.py
script, however. On line 6 there is this:
from main import instantiate_from_config
However, if this is referring to the main.py
file, then instantiate_from_config
isn't defined there. It is defined in ldm.util
. So how does this work?
I run it: python scripts/inpaint_st.py
, so inpaint_st.py
it's inside scripts
directory.
On main.py
we have from ldm.util import instantiate_from_config
so it imports
def instantiate_from_config(config, **kwargs):
if not 'target' in config:
if config == '__is_first_stage__':
return None
elif config == '__is_unconditional__':
return None
raise KeyError('Expected key `target` to instantiate.')
return get_obj_from_str(config['target'])(
**config.get('params', dict()), **kwargs
)
Before I get into the inpainting functionality, I'm trying to modify our code base so that the inpainting model can be used for vanilla txt2img
and img2img
. I've gotten it to the point where the inference loop in ldm.invoke.generator.base
actually runs (by removing self.model.ema_scope()
. After this, I added a check for c_concat
being None in ddpm.py
and fell back to using crosstalk
, which got past this step.
However, running a txt2image prompt now gives:
File "/u/lstein/projects/SD/InvokeAI/ldm/models/diffusion/ddim.py", line 45, in p_sample
e_t_uncond, e_t = self.model.apply_model(x_in, t_in, c_in).chunk(2)
File "/u/lstein/projects/SD/InvokeAI/ldm/models/diffusion/ddpm.py", line 1441, in apply_model
x_recon = self.model(x_noisy, t, **cond)
File "/u/lstein/.conda/envs/invokeai/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/u/lstein/projects/SD/InvokeAI/ldm/models/diffusion/ddpm.py", line 2174, in forward
out = self.diffusion_model(x, t, context=cc)
File "/u/lstein/.conda/envs/invokeai/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/u/lstein/projects/SD/InvokeAI/ldm/modules/diffusionmodules/openaimodel.py", line 806, in forward
h = module(h, emb, context)
File "/u/lstein/.conda/envs/invokeai/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/u/lstein/projects/SD/InvokeAI/ldm/modules/diffusionmodules/openaimodel.py", line 90, in forward
x = layer(x)
File "/u/lstein/.conda/envs/invokeai/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/u/lstein/.conda/envs/invokeai/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 447, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/u/lstein/.conda/envs/invokeai/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 443, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [320, 9, 3, 3], expected input[2, 4, 64, 64] to have 9 channels, but got 4 channels instead
I think this is because the model requires the additional channels delivered by c_concat
, and indeed inpaint_st.py
has a loop in which it adds several channels to c_concat
. I have to determine whether you can dummy up these channels when there is no initial image or mask.
Do we know for sure that the inpainting model can be used for txt2image on its own? It will be messy to keep two models around and swap them back and forth depending on what the user is asking it to do.
@Any-Winter-4079 could you share with me the changes you made to sampler.py
and ddim.py
? inpaint_st.py
works perfectly for me, with the exception that the masked region is simply replaced with a nice reconstructed background rather than with the prompt concept, as if the conditional_conditioning were not being provided.
Folks, thinking more about how to integrate the custom runwayML inpainting model into InvokeAI. I'm going to assume that it's technically infeasible to get the inpainting model to do pure txt2img
and won't pursue this line of inquiry unless I hear otherwise. So here's what we can do instead.
models.yaml
will have new inpainting_weights
and inpainting_config
keys, as in:
stable-diffusion-1.5:
description: Stable Diffusion inference model version 1.5
config: configs/stable-diffusion/v1-inference.yaml
weights: models/ldm/stable-diffusion-v1/v1-5-pruned-emaonly.ckpt
vae: models/ldm/stable-diffusion-v1/vae-ft-mse-840000-ema-pruned.ckpt
inpainting_weights: sd-v1-5-inpainting.ckpt
inpainting_config: configs/stable-diffusion/v1-inpainting-inference.yaml
width: 512
height: 512
img2img
the code will look for these two keys in the current model's config, and if they are present will load the model (or switch from the CPU cached version as appropriate) and run the new inpainting code.The main problem here is that there will be a delay for loading and a slight delay for switching. Also both models are pretty big and will eat CPU RAM.
Improvements on this idea most welcome.
Do we know for sure that the inpainting model can be used for txt2image on its own? It will be messy to keep two models around and swap them back and forth depending on what the user is asking it to do.
I haven't tried inpainting for regular txt2img
(nor read anything about it on Reddit). I think if it worked, results would be different given the additional training steps of that model, but maybe it could work. I got to the same errors as you (self.model.ema_scope(), etc.) and eventually left it when it said RuntimeError: Given groups=1, weight of size [320, 9, 3, 3], expected input[2, 4, 64, 64] to have 9 channels, but got 4 channels instead
. At that point I switched my strategy to trying to run inpaint_st.py
. But maybe adding 5 more channels would produce coherent images. It's a matter of trying.
@Any-Winter-4079 could you share with me the changes you made to sampler.py and ddim.py? inpaint_st.py works perfectly for me, with the exception that the masked region is simply replaced with a nice reconstructed background rather than with the prompt concept, as if the conditional_conditioning were not being provided.
Changes to sampler.py
:
Right before # check to see if make_schedule() has run, and if not, run it
if conditioning is not None:
if isinstance(conditioning, dict):
ctmp = conditioning[list(conditioning.keys())[0]]
while isinstance(ctmp, list):
ctmp = ctmp[0]
cbs = ctmp.shape[0]
if cbs != batch_size:
print(f"Warning: Got {cbs} conditionings but batch-size is {batch_size}")
else:
if conditioning.shape[0] != batch_size:
print(f"Warning: Got {conditioning.shape[0]} conditionings but batch-size is {batch_size}")
Changes to ddim.py
:
After t_in = torch.cat([t] * 2)
if isinstance(c, dict):
assert isinstance(unconditional_conditioning, dict)
c_in = dict()
for k in c:
if isinstance(c[k], list):
c_in[k] = [
torch.cat([unconditional_conditioning[k][i], c[k][i]])
for i in range(len(c[k]))
]
else:
c_in[k] = torch.cat([unconditional_conditioning[k], c[k]])
else:
c_in = torch.cat([unconditional_conditioning, c])
Changes to ddpm.py
:
from omegaconf import ListConfig
In class LatentDiffusion:
@torch.no_grad()
def get_unconditional_conditioning(self, batch_size, null_label=None):
if null_label is not None:
xc = null_label
if isinstance(xc, ListConfig):
xc = list(xc)
if isinstance(xc, dict) or isinstance(xc, list):
c = self.get_learned_conditioning(xc)
else:
if hasattr(xc, "to"):
xc = xc.to(self.device)
c = self.get_learned_conditioning(xc)
else:
# todo: get null label from cond_stage_model
raise NotImplementedError()
c = repeat(c, "1 ... -> b ...", b=batch_size).to(self.device)
return c
And then add another class:
class LatentInpaintDiffusion(LatentDiffusion):
def __init__(
self,
concat_keys=("mask", "masked_image"),
masked_image_key="masked_image",
finetune_keys=None,
*args,
**kwargs,
):
super().__init__(*args, **kwargs)
self.masked_image_key = masked_image_key
assert self.masked_image_key in concat_keys
self.concat_keys = concat_keys
@torch.no_grad()
def get_input(
self, batch, k, cond_key=None, bs=None, return_first_stage_outputs=False
):
# note: restricted to non-trainable encoders currently
assert (
not self.cond_stage_trainable
), "trainable cond stages not yet supported for inpainting"
z, c, x, xrec, xc = super().get_input(
batch,
self.first_stage_key,
return_first_stage_outputs=True,
force_c_encode=True,
return_original_cond=True,
bs=bs,
)
assert exists(self.concat_keys)
c_cat = list()
for ck in self.concat_keys:
cc = (
rearrange(batch[ck], "b h w c -> b c h w")
.to(memory_format=torch.contiguous_format)
.float()
)
if bs is not None:
cc = cc[:bs]
cc = cc.to(self.device)
bchw = z.shape
if ck != self.masked_image_key:
cc = torch.nn.functional.interpolate(cc, size=bchw[-2:])
else:
cc = self.get_first_stage_encoding(self.encode_first_stage(cc))
c_cat.append(cc)
c_cat = torch.cat(c_cat, dim=1)
all_conds = {"c_concat": [c_cat], "c_crossattn": [c]}
if return_first_stage_outputs:
return z, all_conds, x, xrec, xc
return z, all_conds
@lstein This might also be easier for you. https://github.com/runwayml/stable-diffusion/commits/main It is the list of commits in the runwayml repo.
I missed the sampler.py
change because it didn't seem to be doing anything except error checking. The cbs
variable is never used again in the code. I still don't see what it is doing.
This might also be easier for you. https://github.com/runwayml/stable-diffusion/commits/main
In fact I had cloned this repository and did a git diff to get the changes.
Nope, that didn't fix it. I'm passing the prompt "macaw" to an image of myself with a masked parrot on the shoulder. It very nicely removes the parrot, but doesn't change it to a macaw:
I'm using the inpaint_st.py script you'd zipped. Any idea of what I might be doing wrong?
Do we know for sure that the inpainting model can be used for txt2image on its own? It will be messy to keep two models around and swap them back and forth depending on what the user is asking it to do.
I haven't tried inpainting for regular
txt2img
(nor read anything about it on Reddit). I think if it worked, results would be different given the additional training steps of that model, but maybe it could work. I got to the same errors as you (self.model.ema_scope(), etc.) and eventually left it when it saidRuntimeError: Given groups=1, weight of size [320, 9, 3, 3], expected input[2, 4, 64, 64] to have 9 channels, but got 4 channels instead
. At that point I switched my strategy to trying to runinpaint_st.py
. But maybe adding 5 more channels would produce coherent images. It's a matter of trying.
I doubt it will work and don't want to spend time pursuing it. I'm going to use the strategy I outlined in the later comment... I just need to debug whatever's wrong in the inpaint_st.py
script and library and then I can port.
Folks, thinking more about how to integrate the custom runwayML inpainting model into InvokeAI. I'm going to assume that it's technically infeasible to get the inpainting model to do pure
txt2img
and won't pursue this line of inquiry unless I hear otherwise. So here's what we can do instead.
models.yaml
will have newinpainting_weights
andinpainting_config
keys, as in:stable-diffusion-1.5: description: Stable Diffusion inference model version 1.5 config: configs/stable-diffusion/v1-inference.yaml weights: models/ldm/stable-diffusion-v1/v1-5-pruned-emaonly.ckpt vae: models/ldm/stable-diffusion-v1/vae-ft-mse-840000-ema-pruned.ckpt inpainting_weights: sd-v1-5-inpainting.ckpt inpainting_config: configs/stable-diffusion/v1-inpainting-inference.yaml width: 512 height: 512
- When user provides a mask to
img2img
the code will look for these two keys in the current model's config, and if they are present will load the model (or switch from the CPU cached version as appropriate) and run the new inpainting code.- When any other operation is requested, the inpainting model will be swapped out so that it isn't used.
The main problem here is that there will be a delay for loading and a slight delay for switching. Also both models are pretty big and will eat CPU RAM.
Improvements on this idea most welcome.
My understanding is txt2img works with this model by passing in a fully masked dummy image for the extra channels, for example https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/6cbb04f7a5e675cf1f6dfc247aa9c9e8df7dc5ce/modules/processing.py#L559
Probably worth trying something like that out first as it would be much simpler to pass in dummy image/mask in txt2img mode if the model needs it than introducing a new split inpainting model concept for all models in the code.
Interesting. I wouldn't have thought that would work. I'll give it a try now.
Are you setting the prompt in inpaint_st.py
? e.g.
prompt = 'taylor swift'
Yes!
When you added personalization_config
to v1-inpainting-inference.yaml
, did you just cut and paste the same section from v1-inference.yaml?
Indeed. I also get your image.
When you added
personalization_config
tov1-inpainting-inference.yaml
, did you just cut and paste the same section fromv1-inference.yaml?
Yep.
Macaw bird does work, though.
Sorry. I don't understand. "macaw" doesn't work, but "macaw bird" does?
I don't know if it's a thing with single word prompts (a bug) or we miss some step from runwayml's implementation. But yeah, try with prompt: macaw bird.
Bloody hell. Neither "macaw" nor "macaw bird" work, but "Macaw bird" does. Case sensitive?
Actually, the lowercase version works for me. Maybe it's seed specific.
I also tried with eagle
and again nothing.
But an eagle sitting on shoulder
:
I find inpainting frustrating as it takes a lot of tries to get something that matches the existing image nicely. It seems like there are inpainting-specific models that are much better at doing this, for example RunwayMLs model . Also its nice that the RunwayML model is trained with extra non-inpainting steps (akin to sd 1.5?) so it would be good to support txt2img using it as well
Is it possible to add support for this model here? I tried porting in the RunwayML changes directly but seems like more is needed.