with torch.no_grad()
vit.eval()
pali.eval()
img_embeds = vit(
img,
return_embeddings = True
)
# how to do this?
# XTransformer.generate() does not take src_prepend_embeds that can be fed to encoder
output_text = pali.generate(
img_embeds,
prompt,
mask = prompt_mask,
)
How would one generate an action (output text) using PaLI?
PaLI from
readme.md
Desired behaviour
Idea?