Open HasnainKhanNiazi opened 1 week ago
Hey Hasnain, thanks for reaching out! Have you tried the regular e5/large embeddings? These are significantly more performant in english. If you need multi-lingual embeddings, openai doesnt support those at the moment. In any case, we don't currently support openai embeddings in Marqo.
Another option to get better performance would be to sign up for Marqtune so you can finetune your embeddings to improve them for your usecase.
Thanks @tomhamer @wanliAlex for the suggestions. I am having multi-lingual document (German, Italian, English). I will checkout the custom embeddings section as well.
One follow-up question related to filter_string
, to the best of my understanding for filter string it is required to add values separately for example;
If I add price
like this then the query filtering is working fine
mq.index("my-first-index").add_documents([
{
"Title": "The Travels of Marco Polo",
"Description": "A 13th-century travelogue describing Polo's travels"
},
{
"Title": "Extravehicular Mobility Unit (EMU)",
"Description": "The EMU is a spacesuit that provides environmental protection, "
"mobility, life support, and communications for astronauts; 'price': '100'",
"_id": "article_591",
'price': '100',
}],
tensor_fields=["Description"]
)
But lets say price
is added or written somewhere in the description then filter_string
won't be working.
The main problem in my case is that if I keep adding new fields for each different attribute then I will end up having around 2000 fields which is way too much and that's why I am looking for a solution to do the matching/fuzzy matching in the description.
@tomhamer
If you need multi-lingual embeddings, openai doesnt support those at the moment.
What do you mean by this line? OpenAI text-embedding-03-large is multi-lingual and for simple vector search, it is giving me better results if I compare with any other Open source model but I wanna do some more keyword search like filter_string.
Hey, I am playing around with marqo, did multiple experiments and I am having a few questions.
hf/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large
to generate embeddings and created an index and now search isn't working as expected. For example, if I type in a search query "bike" then the first 3-4 retrieved documents are not even related to bikes.This is how I am using marqo index;
And above both queries are not working as expected. Any help or guidance will be appreciated. Thanks