[Bug]: CLIP can only handle sequences up to 77 tokens

xuexue49 commented 10 months ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens:

Steps to reproduce the problem

Go to text2image
Press more then 77 tokens
the tokens will truncated

What should have happened?

it can work like https://github.com/AUTOMATIC1111/stable-diffusion-webui

Version or Commit where the problem happens

master

What Python version are you running on ?

None

What platforms do you use to access the UI ?

No response

What device are you running WebUI on?

No response

Cross attention optimization

Automatic

What browsers do you use to access the UI ?

No response

Command Line Arguments

No

List of extensions

No

Console logs

The following part of your input was truncated because CLTP can only handle sequences up to 77l tokens:['. snow, ', ', snow,', ', snow ,']

Additional information

No response

ananosleep commented 8 months ago

Same problem, but my console log is different. When only the positive prompts have more than 77 tokens, and negative prompts is empty, log is:

Error completing request Arguments: ('task(1ylp9wiocwo39tr)', '1girl,(a cute little loli,curvy),long blonde hair,(one side up:1.24),blue eyes,(large breasts:1.14),(curvy:1.06),(white off-shoulder lace trim babydoll),(white open navel lucency clothes),[(navel:1.3):0.2],white side-tie panties,white bridal gauntlets,(strapless:1.13),(white see-through thighhighs),no shoes,bare shoulders,frills', '', [], 20, 'Euler a', 2, 1, 7, 640, 640, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], <gradio.routes.Request object at 0x000001FA93FADC60>, 1, False, '', 0.8, -1, False, -1, 0, 0, 0, 'None', 'None', 'GPU.0', True, 'Euler a', True, False, 'None', 0.8, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, False) {} Traceback (most recent call last): File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\modules\call_queue.py", line 57, in f res = list(func(*args, kwargs)) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\modules\call_queue.py", line 36, in f res = func(*args, *kwargs) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\modules\txt2img.py", line 52, in txt2img processed = modules.scripts.scripts_txt2img.run(p, args) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\modules\scripts.py", line 601, in run processed = script.run(p, script_args) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\scripts\openvino_accelerate.py", line 1132, in run processed = process_images_openvino(p, model_config, vae_ckpt, p.sampler_name, enable_caching, openvino_device, mode, is_xl_ckpt, refiner_ckpt, refiner_frac) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\scripts\openvino_accelerate.py", line 882, in process_images_openvino output = shared.sd_diffusers_model( File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion.py", line 613, in call self.check_inputs( File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion.py", line 497, in check_inputs raise ValueError( ValueError: prompt_embeds and negative_prompt_embeds must have the same shape when passed directly, but got: prompt_embeds torch.Size([1, 154, 768]) != negative_prompt_embeds torch.Size([1, 77, 768]).

However, when I try to add negative prompts to more than 77 tokens, it threw out an other error:

0%| | 0/20 [00:00<?, ?it/s]'SymInt' object has no attribute 'type' [2023-10-25 15:32:23,698] [0/1] torch._inductor.fx_passes.split_cat: [WARNING] example value absent for node: cat_13 [2023-10-25 15:32:23,700] [0/1] torch._inductor.fx_passes.split_cat: [WARNING] example value absent for node: cat_12 [2023-10-25 15:32:23,701] [0/1] torch._inductor.fx_passes.split_cat: [WARNING] example value absent for node: cat_11 [2023-10-25 15:32:23,701] [0/1] torch._inductor.fx_passes.split_cat: [WARNING] example value absent for node: cat_10 [2023-10-25 15:32:23,701] [0/1] torch._inductor.fx_passes.split_cat: [WARNING] example value absent for node: cat_9 [2023-10-25 15:32:23,701] [0/1] torch._inductor.fx_passes.split_cat: [WARNING] example value absent for node: cat_8 [2023-10-25 15:32:23,702] [0/1] torch._inductor.fx_passes.split_cat: [WARNING] example value absent for node: cat_7 [2023-10-25 15:32:23,702] [0/1] torch._inductor.fx_passes.split_cat: [WARNING] example value absent for node: cat_6 [2023-10-25 15:32:23,702] [0/1] torch._inductor.fx_passes.split_cat: [WARNING] example value absent for node: cat_5 [2023-10-25 15:32:23,703] [0/1] torch._inductor.fx_passes.split_cat: [WARNING] example value absent for node: cat_4 [2023-10-25 15:32:23,703] [0/1] torch._inductor.fx_passes.split_cat: [WARNING] example value absent for node: cat_3 [2023-10-25 15:32:23,703] [0/1] torch._inductor.fx_passes.split_cat: [WARNING] example value absent for node: cat_2 [2023-10-25 15:32:23,703] [0/1] torch._inductor.fx_passes.split_cat: [WARNING] example value absent for node: cat_1 [2023-10-25 15:32:23,703] [0/1] torch._inductor.fx_passes.split_cat: [WARNING] example value absent for node: cat 0%| | 0/20 [00:13<?, ?it/s] Error completing request Arguments: ('task(j25dtsphpvzk1x8)', '1girl,(a cute little loli,curvy),long blonde hair,(one side up:1.24),blue eyes,(large breasts:1.14),(curvy:1.06),(white off-shoulder lace trim babydoll),(white open navel lucency clothes),[(navel:1.3):0.2],white side-tie panties,white bridal gauntlets,(strapless:1.13),(white see-through thighhighs),no shoes,bare shoulders,frills', '(cross-legged:1.2), (nsfw:1.5),(((pubic))), ((((pubic_hair))))sketch, duplicate, ugly, huge eyes, text, logo, monochrome, worst face, (bad and mutated hands:1.3), (worst quality:2.0), (low quality:2.0), (blurry:2.0), horror, geometry, (bad hands), (missing fingers), multiple limbs, bad anatomy, (interlocked fingers:1.2), Ugly Fingers, (extra digit and hands and fingers and legs and arms:1.4), crown braid, ((2girl)), (deformed fingers:1.2), (long fingers:1.2),succubus wings,horn,succubus horn,succubus hairstyle', [], 20, 'Euler a', 2, 1, 7, 640, 640, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], <gradio.routes.Request object at 0x000001FA918CC130>, 1, False, '', 0.8, -1, False, -1, 0, 0, 0, 'None', 'None', 'GPU.0', True, 'Euler a', True, False, 'None', 0.8, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, False) {} Traceback (most recent call last): File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\modules\call_queue.py", line 57, in f res = list(func(*args, kwargs)) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\modules\call_queue.py", line 36, in f res = func(*args, *kwargs) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\modules\txt2img.py", line 52, in txt2img processed = modules.scripts.scripts_txt2img.run(p, args) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\modules\scripts.py", line 601, in run processed = script.run(p, script_args) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\scripts\openvino_accelerate.py", line 1132, in run processed = process_images_openvino(p, model_config, vae_ckpt, p.sampler_name, enable_caching, openvino_device, mode, is_xl_ckpt, refiner_ckpt, refiner_frac) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\scripts\openvino_accelerate.py", line 882, in process_images_openvino output = shared.sd_diffusers_model( File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion.py", line 680, in call noise_pred = self.unet( File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\eval_frame.py", line 328, in _fn return fn(args, kwargs) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\eval_frame.py", line 490, in catch_errors return callback(frame, cache_entry, hooks, frame_state) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\convert_frame.py", line 641, in _convert_frame result = inner_convert(frame, cache_size, hooks, frame_state) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\convert_frame.py", line 133, in _fn return fn(args, kwargs) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\convert_frame.py", line 389, in _convert_frame_assert return _compile( File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\convert_frame.py", line 569, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\utils.py", line 189, in time_wrapper r = func(*args, kwargs) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\convert_frame.py", line 491, in compile_inner out_code = transform_code_object(code, transform) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\bytecode_transformation.py", line 1028, in transform_code_object transformations(instructions, code_options) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\convert_frame.py", line 458, in transform tracer.run() File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\symbolic_convert.py", line 2074, in run super().run() File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\symbolic_convert.py", line 724, in run and self.step() File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\symbolic_convert.py", line 688, in step getattr(self, inst.opname)(inst) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\symbolic_convert.py", line 2162, in RETURN_VALUE self.output.compile_subgraph( File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\output_graph.py", line 857, in compile_subgraph self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root) File "D:\temp\python\lib\contextlib.py", line 79, in inner return func(*args, *kwds) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\output_graph.py", line 957, in compile_and_call_fx_graph compiled_fn = self.call_user_compiler(gm) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\utils.py", line 189, in time_wrapper r = func(args, kwargs) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\output_graph.py", line 1024, in call_user_compiler raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\output_graph.py", line 1009, in call_user_compiler compiled_fn = compiler_fn(gm, self.example_inputs()) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\repro\after_dynamo.py", line 117, in debug_wrapper compiled_gm = compiler_fn(gm, example_inputs) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch__init.py", line 1607, in call__ return self.compilerfn(model, inputs_, self.kwargs) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\backends\common.py", line 95, in wrapper return fn(model, inputs, kwargs) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\scripts\openvino_accelerate.py", line 194, in openvino_fx return compile_fx(subgraph, example_inputs) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_inductor\compile_fx.py", line 1150, in compile_fx return aot_autograd( File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\backends\common.py", line 55, in compiler_fn cg = aot_module_simplified(gm, example_inputs, kwargs) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_functorch\aot_autograd.py", line 3891, in aot_module_simplified compiled_fn = create_aot_dispatcher_function( File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_dynamo\utils.py", line 189, in time_wrapper r = func(*args, kwargs) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_functorch\aot_autograd.py", line 3379, in create_aot_dispatcher_function fw_metadata = run_functionalized_fw_and_collect_metadata( File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_functorch\aot_autograd.py", line 757, in inner flat_f_outs = f(flat_f_args) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch_functorch\aot_autograd.py", line 3496, in functional_call out = Interpreter(mod).run(args[params_len:], kwargs) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch\fx\interpreter.py", line 138, in run self.env[node] = self.run_node(node) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch\fx\interpreter.py", line 195, in run_node return getattr(self, n.op)(n.target, args, kwargs) File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\torch\fx\interpreter.py", line 267, in call_function return target(*args, kwargs) torch._dynamo.exc.BackendCompilerFailed: backend='openvino_fx' raised: TypeError: 'SymInt' object is not subscriptable While executing %getitem : [num_users=1] = call_function[target=operator.getitem](args = (%ltimestep, None), kwargs = {}) Original traceback: File "D:\AI\sd-webui\openVINO\stable-diffusion-webui\venv\lib\site-packages\diffusers\models\unet_2d_condition.py", line 827, in forward timesteps = timesteps[None].to(sample.device) Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True

Finally, when prompts are less than 77 tokens, it generate images properly.

DDreame commented 8 months ago

@ananosleep Hello. I also is a openvino stable-diffusion user. I think another ways is make negative prompt more than 77 tokens. The prompt and negative prompt should be same length. or you can padding the short prompt embedding to the long prompt embedding length. And it maybe will recompile IR model.

Tip: same length is meaning the prompt and negative prompt is same range. eg: Their length are 0-77 or 78-154.

ananosleep commented 8 months ago

@ananosleep Hello. I also is a openvino stable-diffusion user. I think another ways is make negative prompt more than 77 tokens. The prompt and negative prompt should be same length. or you can padding the short prompt embedding to the long prompt embedding length. And it maybe will recompile IR model.

Tip: same length is meaning the prompt and negative prompt is same range. eg: Their length are 0-77 or 78-154.

I have tried but another different error occurred. I have mentioned above but don't know what this error means.

DDreame commented 8 months ago

@ananosleep Hello. I also is a openvino stable-diffusion user. I think another ways is make negative prompt more than 77 tokens. The prompt and negative prompt should be same length. or you can padding the short prompt embedding to the long prompt embedding length. And it maybe will recompile IR model. Tip: same length is meaning the prompt and negative prompt is same range. eg: Their length are 0-77 or 78-154.

I have tried but another different error occurred. I have mentioned above but don't know what this error means.

Ops! I'm sorry reply is so late. I guess this error is because have a same name unet model, but the model graph is different. Did you have cleaned your cache dir? You can try those steps:

Clean or backup your cache dir(On your root dir).
Start WebUI
keep Two prompt in same length range.
Run.

I cannot promise those steps can work. Because I use openvino script code in my code and do some change on it. But I had use model that have the more than 77 tokens on openvino stable diffusion.

If you clean the cachedir and the error is continue, you can keep reply, I am happy to talk. And at the end, my English is bad, I'm so sorry.

openvinotoolkit / stable-diffusion-webui