marqo-ai / marqo

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
https://www.marqo.ai/
Apache License 2.0
4.61k stars 189 forks source link

[BUG] Lexical and Tensor Search does not retrieve tensor fields when specified in `attributes_to_retrieve` #249

Open Jeadie opened 1 year ago

Jeadie commented 1 year ago

Describe the bug When specifying attributes_to_retrieve and using Lexical search, tensor fields are not returned in the hits.

To Reproduce Consider an index my-index, with fields:

mq = marqo.Client(url="http://localhost:8882")
question = 'what should I do?'

results = mq.index('my-index').search(
            q=question,
            device="cuda",
            search_method="LEXICAL",
            limit=1,
            attributes_to_retrieve=['title', 'body'],
            )

Expected behavior

>> result 
{"hits": [{"title": "Marqo Polo", "body": "One day Marqo Polo went for a walk"}], "processingTimeMs": 382, "query": "0xca7ca7bcc765f77339be2d648ba53ce9c8a262bd", "limit": 1}

Actual behavior

>> result 
{"hits": [{"title": "Marqo Polo"}], "processingTimeMs": 382, "query": "0xca7ca7bcc765f77339be2d648ba53ce9c8a262bd", "limit": 1}

Additional context

pandu-k commented 1 year ago

Having trouble reproducing this:

import marqo
mq = marqo.Client(url="http://localhost:8882")

docs =[
    {'body': 'man holding a rock', 
     'field': 'abc 123', 
     'image': "http://google.com/images/rocks.png", 
     'title': 'Alfred, the holder of a rock', 
     'url': 'youtube.com', 'category': 'fun'}
]

mq.index('my-index').add_documents(docs, non_tensor_fields=[ 'title', 'url', '_id', 'category'])

results = mq.index('my-index').search(
     q='question, who is the rock holder?', 
     search_method="LEXICAL", 
     limit=1, 
     attributes_to_retrieve=['title', 'body'],
)

print("Results", results)
# Outputs:
""" 
{'hits': [{
    'body': 'man holding a rock', 
    'title': 'Alfred, the holder of a rock', 
    '_id': 'a4c86363-7383-46cb-97fc-c073c0502b60', 
    '_score': 0.8630463, '_highlights': []}], 
    'processingTimeMs': 91, 'query': 'question, who is the rock holder?', 'limit': 1}
"""

As a sanity check:

[facet.keys() for facet in mq.index('my-index').get_document('2c27d28f-9a34-4520-8c8f-3624fbea2769', expose_facets=True)['_tensor_facets']]
# outputs
"""
[dict_keys(['body', '_embedding']), dict_keys(['field', '_embedding']), dict_keys(['image', '_embedding'])]
"""

Environments tried: