Closed KOjuny closed 4 months ago
Hi, does the new demo notebook works? The gradio server have not been updated. We will fix it soon.
Another potential issue is diffuser versions. We just updated the environment file. The major change is that diffuser is upgraded to diffusers==0.26.3
Thanks for answering. But same error appear in new demo notebook.
res0,res,output_caption = pipe(inst,mm_data,alpha = 1.0,h=[0.1*5,0.5,0.5],norm=20.0,refinement=0.2,llm_only=True,num_inference_steps=50,use_cache=use_cache,debug=False,diffusion_mode='ipa',subject_strength=0.0,cfg=5)
In this code, the same error appear.
Hi, we are unable to reproduce the error locally. Can you share your versions of torch, transformers package? The issue seems to be related to kv cache in the LLM inference step
This happenes in this code part
If we add a print on L77 for past_key_values_length
and run the following test script
import torch
from PIL import Image
from instructany2pix import InstructAny2PixPipeline
import IPython
def image_grid(imgs, rows, cols):
assert len(imgs) == rows*cols
w, h = imgs[0].size
grid = Image.new('RGB', size=(cols*w, rows*h))
grid_w, grid_h = grid.size
for i, img in enumerate(imgs):
grid.paste(img, box=(i%cols*w, i//cols*h))
return grid
pipe = InstructAny2PixPipeline(llm_folder='llm-retrained')
cas = "assets/demo/rains_of_castamere.wav"
naruto = "assets/demo/naruto.wav"
x = {"inst": "Please tag the music <video>",
"ans": "an image of an antique shop with a clock ticking",
"mm_data": [{"type": "audio"}]
}
torch.manual_seed(1)
inst = x['inst']
mm_data = x['mm_data']
mm_data[0]['fname']=naruto
use_cache = 0
res0,res,output_caption = pipe(inst,mm_data,alpha = 1.0,h=[0.1*5,0.5,0.5],norm=20.0,refinement=0.2,llm_only=True,num_inference_steps=50,use_cache=use_cache,debug=False,diffusion_mode='ipa',subject_strength=0.0,cfg=5)
print(output_caption)
We see
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
guitar, electric, drums, fast, electric guitar, rock, techno, beat</s>
Can you check the behavior on your side? This branch should never be taken. It may be also worth trying to set past_key_values=None
before this block
I add past_key_values=None
. After that other error appears.
AttributeError: 'InstructAny2PixLMModel' object has no attribute '_prepare_decoder_attention_mask'
Can I get some advice to solve this problem.
Sorry for the inconvenience
Hi, Can you try commenting out the entire forward method (which will make it fall back to transformers default implementation) and run our test snippet above?
Sorry I'm late. I'm using torch==1.13.1 & transformers==4.42.3 and you mean run this code right?
import torch
from PIL import Image
from instructany2pix import InstructAny2PixPipeline
import IPython
def image_grid(imgs, rows, cols):
assert len(imgs) == rows*cols
w, h = imgs[0].size
grid = Image.new('RGB', size=(cols*w, rows*h))
grid_w, grid_h = grid.size
for i, img in enumerate(imgs):
grid.paste(img, box=(i%cols*w, i//cols*h))
return grid
pipe = InstructAny2PixPipeline(llm_folder='llm-retrained')
cas = "assets/demo/rains_of_castamere.wav"
naruto = "assets/demo/naruto.wav"
x = {"inst": "Please tag the music <video>",
"ans": "an image of an antique shop with a clock ticking",
"mm_data": [{"type": "audio"}]
}
torch.manual_seed(1)
inst = x['inst']
mm_data = x['mm_data']
mm_data[0]['fname']=naruto
use_cache = 0
res0,res,output_caption = pipe(inst,mm_data,alpha = 1.0,h=[0.1*5,0.5,0.5],norm=20.0,refinement=0.2,llm_only=True,num_inference_steps=50,use_cache=use_cache,debug=False,diffusion_mode='ipa',subject_strength=0.0,cfg=5)
print(output_caption)
And I don't understand which forward method do you mean... Do you mean InstructAny2PixLMModel's forward method?
I added print(past_key_values, past_key_values_length)
in L77, and the output was None 0
.
It run 1 loop and AttributeError: 'InstructAny2PixLMModel' object has no attribute '_prepare_decoder_attention_mask' appear.
Hi, have you tried commenting out the entire forward method (which will make it fall back to transformer's default implementation), as mentioned in my previous response? i.e. remove this section https://github.com/jacklishufan/InstructAny2Pix/blob/be848bc8cf3b78215a0a8b0d4332fe1d682383c0/instructany2pix/llm/model/language_model/any2pix_llama.py#L44-L174
Hi, thanks for reply. I tried it, but after that i got cuda out of memory...
Hi, we have transformers==4.34.1 torch==2.0.1 tokenizers==0.14.1
Also, you need at least 32GB VRAM to run our model
Thank you But, I followed the requirements.txt and encountered error message. So, I chose a random version to install to avoid such an error, but now I get the following error message and I don't know what to do: AttributeError: 'InstructAny2PixLMModel' object has no attribute '_prepare_decoder_attention_mask' Help me 😥
Please try latest code, we have removed this potentially broken function in favor of the default implementation in the transformer library.
Thanks, I got it. And I have a question. Can I use your model with smaller VRAM??
Hi, sorry, our model requires at minimum 32GB VRAM, Since The bug in question has been fixed, I'm closing the issue. Feel free to reopen it if new problems emerge.
Hello,
I have been studying and practicing with your project, which I find very interesting. However, I encountered the following error as shown in the image below and I am not sure how to resolve it.
I would greatly appreciate it if you could provide some guidance on how to fix this issue.
Thank you very much.