Closed valencebond closed 9 months ago
@valencebond Thanks for the question! I haven't investigated this in the paper because using multiple LLM transformers to handle visual tokens is beyond the resources we had.
@valencebond Thanks for the question! I haven't investigated this in the paper because using multiple LLM transformers to handle visual tokens is beyond the resources we had.