[Feature]: jina-clip-v1

ppaanngggg commented 3 weeks ago

What problem does the new feature solve?

jina-clip-v1 is the best multi-modal embedding model now.

What does the feature do?

It can be used to build better image retrieval application.

Implementation challenges

According to the api https://jina.ai/?sui&model=jina-clip-v1

We need to pass plain text as

{
  "text": "A blue cat"
}

or image from url or base64 encoded as

{
  "image": "https://i.pinimg.com/600x315/21/48/7e/21487e8e0970dd366dafaed6ab25d8d8.jpg"
},
{
  "image": "R0lGODlhEAAQAMQAAORHHOVSKudfOulrSOp3WOyDZu6QdvCchPGolfO0o/XBs/fNwfjZ0frl3/zy7////wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAkAABAALAAAAAAQABAAAAVVICSOZGlCQAosJ6mu7fiyZeKqNKToQGDsM8hBADgUXoGAiqhSvp5QAnQKGIgUhwFUYLCVDFCrKUE1lBavAViFIDlTImbKC5Gm2hB0SlBCBMQiB0UjIQA7"
}

Are you going to work on this feature?

🆘 No, could someone else please consider working on it?

pbarker commented 2 weeks ago

+1 also consider supporting any multimodal embedding model. This is the biggest blocker to us adopting

alejandrodnm commented 2 weeks ago

We are currently figuring a list of integrations and the priority in which we are going to tackle them. We'll keep this one in mind during our planning.

Thanks for submitting the issue.

timescale / pgai