Hello, when I use this model to tackle VQA task by passing the visual querys to inputs_embeds argument without input_ids, it results in that there is an extra dimension in attention_mask and position_ids, e.g., 442 vs 441 for last dimension. How can I fix it? Thanks.
Hello, when I use this model to tackle VQA task by passing the visual querys to inputs_embeds argument without input_ids, it results in that there is an extra dimension in attention_mask and position_ids, e.g., 442 vs 441 for last dimension. How can I fix it? Thanks.![image](https://github.com/stanford-crfm/BioMedLM/assets/35674925/b9218457-0764-4f28-aedd-43ef87361675)