I'm looking to embed multiple modalities into your conventional text based LLMs. For that I need to convert any modality into a CLIP vector which I have done, now I need to convert this vector into an LLM text token embedding. Can anyone help me out with this conversion?
I'm looking to embed multiple modalities into your conventional text based LLMs. For that I need to convert any modality into a CLIP vector which I have done, now I need to convert this vector into an LLM text token embedding. Can anyone help me out with this conversion?