lucidrains / CoCa-pytorch

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
MIT License
1.04k stars 88 forks source link

Generating the caption of a given image #3

Open claudiogreco opened 2 years ago

claudiogreco commented 2 years ago

Hello,

Thank you for having implemented this model. Have you already implemented some code to generate the caption of a given image? If not, do you have an idea about how you would do it in this particular architecture?

Thank you in advance.

mk-runner commented 10 months ago
logits = coca(
    text = text,
    images = images
) # (4, 512, 20000)

I also have the same question. Although the caption logits can be obtained using the above code, text_tokens cannot be obtained and only image_tokens can be used in the inference phase.

Thank you in advance.

SeaN0X commented 4 months ago

Same problem here, with logits i get a huge tensor, but i didn't figure out how to convert it to text.