KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'

KOjuny commented 4 months ago

Hello,

I have been studying and practicing with your project, which I find very interesting. However, I encountered the following error as shown in the image below and I am not sure how to resolve it.

I would greatly appreciate it if you could provide some guidance on how to fix this issue.

Thank you very much. 스크린샷 2024-07-03 오후 2 47 14

jacklishufan commented 4 months ago

Hi, does the new demo notebook works? The gradio server have not been updated. We will fix it soon.

jacklishufan commented 4 months ago

Another potential issue is diffuser versions. We just updated the environment file. The major change is that diffuser is upgraded to diffusers==0.26.3

KOjuny commented 4 months ago

Thanks for answering. But same error appear in new demo notebook. res0,res,output_caption = pipe(inst,mm_data,alpha = 1.0,h=[0.1*5,0.5,0.5],norm=20.0,refinement=0.2,llm_only=True,num_inference_steps=50,use_cache=use_cache,debug=False,diffusion_mode='ipa',subject_strength=0.0,cfg=5) In this code, the same error appear.

jacklishufan commented 4 months ago

Hi, we are unable to reproduce the error locally. Can you share your versions of torch, transformers package? The issue seems to be related to kv cache in the LLM inference step

This happenes in this code part

https://github.com/jacklishufan/InstructAny2Pix/blob/be848bc8cf3b78215a0a8b0d4332fe1d682383c0/instructany2pix/llm/model/language_model/any2pix_llama.py#L75-L80

If we add a print on L77 for past_key_values_length and run the following test script

import torch
from PIL import Image
from instructany2pix import InstructAny2PixPipeline
import IPython

def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols

    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    grid_w, grid_h = grid.size

    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid

pipe = InstructAny2PixPipeline(llm_folder='llm-retrained')

cas = "assets/demo/rains_of_castamere.wav"
naruto = "assets/demo/naruto.wav"
x = {"inst": "Please tag the music <video>",
 "ans": "an image of an antique shop with a clock ticking",
 "mm_data": [{"type": "audio"}]
}

torch.manual_seed(1)
inst = x['inst']
mm_data = x['mm_data']
mm_data[0]['fname']=naruto
use_cache = 0
res0,res,output_caption = pipe(inst,mm_data,alpha = 1.0,h=[0.1*5,0.5,0.5],norm=20.0,refinement=0.2,llm_only=True,num_inference_steps=50,use_cache=use_cache,debug=False,diffusion_mode='ipa',subject_strength=0.0,cfg=5)
print(output_caption)

We see

None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
 guitar, electric, drums, fast, electric guitar, rock, techno, beat</s>

Can you check the behavior on your side? This branch should never be taken. It may be also worth trying to set past_key_values=None before this block

KOjuny commented 4 months ago

I add past_key_values=None. After that other error appears. AttributeError: 'InstructAny2PixLMModel' object has no attribute '_prepare_decoder_attention_mask' Can I get some advice to solve this problem. Sorry for the inconvenience

jacklishufan commented 4 months ago

Hi, Can you try commenting out the entire forward method (which will make it fall back to transformers default implementation) and run our test snippet above?

KOjuny commented 4 months ago

Sorry I'm late. I'm using torch==1.13.1 & transformers==4.42.3 and you mean run this code right?

import torch
from PIL import Image
from instructany2pix import InstructAny2PixPipeline
import IPython

def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols

    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    grid_w, grid_h = grid.size

    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid

pipe = InstructAny2PixPipeline(llm_folder='llm-retrained')

cas = "assets/demo/rains_of_castamere.wav"
naruto = "assets/demo/naruto.wav"
x = {"inst": "Please tag the music <video>",
 "ans": "an image of an antique shop with a clock ticking",
 "mm_data": [{"type": "audio"}]
}

torch.manual_seed(1)
inst = x['inst']
mm_data = x['mm_data']
mm_data[0]['fname']=naruto
use_cache = 0
res0,res,output_caption = pipe(inst,mm_data,alpha = 1.0,h=[0.1*5,0.5,0.5],norm=20.0,refinement=0.2,llm_only=True,num_inference_steps=50,use_cache=use_cache,debug=False,diffusion_mode='ipa',subject_strength=0.0,cfg=5)
print(output_caption)

And I don't understand which forward method do you mean... Do you mean InstructAny2PixLMModel's forward method?

KOjuny commented 4 months ago

I added print(past_key_values, past_key_values_length) in L77, and the output was None 0. It run 1 loop and AttributeError: 'InstructAny2PixLMModel' object has no attribute '_prepare_decoder_attention_mask' appear.

jacklishufan commented 4 months ago

Hi, have you tried commenting out the entire forward method (which will make it fall back to transformer's default implementation), as mentioned in my previous response? i.e. remove this section https://github.com/jacklishufan/InstructAny2Pix/blob/be848bc8cf3b78215a0a8b0d4332fe1d682383c0/instructany2pix/llm/model/language_model/any2pix_llama.py#L44-L174

KOjuny commented 4 months ago

Hi, thanks for reply. I tried it, but after that i got cuda out of memory...

jacklishufan commented 4 months ago

Hi, we have transformers==4.34.1 torch==2.0.1 tokenizers==0.14.1

Also, you need at least 32GB VRAM to run our model

KOjuny commented 4 months ago

Thank you But, I followed the requirements.txt and encountered error message. So, I chose a random version to install to avoid such an error, but now I get the following error message and I don't know what to do: AttributeError: 'InstructAny2PixLMModel' object has no attribute '_prepare_decoder_attention_mask' Help me 😥

jacklishufan commented 4 months ago

Please try latest code, we have removed this potentially broken function in favor of the default implementation in the transformer library.

KOjuny commented 4 months ago

Thanks, I got it. And I have a question. Can I use your model with smaller VRAM??

jacklishufan commented 4 months ago

Hi, sorry, our model requires at minimum 32GB VRAM, Since The bug in question has been fixed, I'm closing the issue. Feel free to reopen it if new problems emerge.

jacklishufan / InstructAny2Pix

KeyError: 'Cache only has 0 layers, attempted to access layer with index 0' #7