shreydan / VisionGPT2

Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.
https://www.kaggle.com/code/shreydan/visiongpt2-image-captioning-pytorch
26 stars 2 forks source link

past_key_tokens #1

Closed arielshaulov closed 3 months ago

arielshaulov commented 5 months ago

Hi, what if i want to use the ability to pass "past_key_values" to the "GPT2LMHeadModel". can you help me with that ?

shreydan commented 5 months ago

hi @arielshaulov, thank you for checking out my work. For 'past_key_values' implementation, I'd recommend going through the HF implementation of GPT2: modeling_gpt2.py

Another way to construct this VLM is to use, VisionEncoderDecoderModel class from transformers, like this: nlpconnect/vit-gpt2-image-captioning, which should also give you the ability to pass 'past_key_values'.

thanks!