marqo-ai / marqo

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
https://www.marqo.ai/
Apache License 2.0
4.47k stars 184 forks source link

Error during processing of large documents #410

Open bazuker opened 1 year ago

bazuker commented 1 year ago

Docker, 6 CPU, 16 GB of RAM, Mac OS Ventura 13.2.1, M2 Max

Using a slightly modified version of https://github.com/iain-mackie/marqo-gpt3 I am trying to process 10 documents with length of 80000 characters each.

I am making a single call mq.index(DOC_INDEX_NAME).add_documents(docs), where docs contains 10 elements.

Client log

Establishing connection to marqo client.
Indexing documents
Traceback (most recent call last):
  File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/_httprequests.py", line 128, in __validate
    request.raise_for_status()
  File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://localhost:8882/indexes/yogi-index/stats

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/bazuker/Documents/user/playground/openai/marqo-gpt3/main.py", line 67, in <module>
    print(f'document index build: {mq.index(DOC_INDEX_NAME).get_stats()}')
  File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/index.py", line 455, in get_stats
    return self.http.get(path=f"indexes/{self.index_name}/stats")
  File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/_httprequests.py", line 88, in get
    return self.send_request(s.get, path=path, body=body, content_type=content_type)
  File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/_httprequests.py", line 74, in send_request
    return self.__validate(response)
  File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/_httprequests.py", line 131, in __validate
    convert_to_marqo_error_and_raise(response=request, err=err)
  File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/_httprequests.py", line 145, in convert_to_marqo_error_and_raise
    raise MarqoWebError(message=response_msg, code=code, error_type=error_type,
marqo.errors.MarqoWebError: MarqoWebError: MarqoWebError Error message: {'message': 'Index `yogi-index` not found.', 'code': 'index_not_found', 'type': 'invalid_request', 'link': None}
status_code: 404, type: invalid_request, code: index_not_found, link: 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/_httprequests.py", line 128, in __validate
    request.raise_for_status()
  File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://localhost:8882/indexes/yogi-index/documents?refresh=true&device=cpu&use_existing_tensors=false

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/bazuker/Documents/user/playground/openai/marqo-gpt3/main.py", line 72, in <module>
    response = mq.index(DOC_INDEX_NAME).add_documents(docs)
  File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/index.py", line 277, in add_documents
    return self._generic_add_update_docs(
  File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/index.py", line 393, in _generic_add_update_docs
    res = self.http.post(path=path_with_query_str, body=documents)
  File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/_httprequests.py", line 96, in post
    return self.send_request(s.post, path, body, content_type)
  File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/_httprequests.py", line 74, in send_request
    return self.__validate(response)
  File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/_httprequests.py", line 131, in __validate
    convert_to_marqo_error_and_raise(response=request, err=err)
  File "/Users/bazuker/Library/Python/3.9/lib/python/site-packages/marqo/_httprequests.py", line 145, in convert_to_marqo_error_and_raise
    raise MarqoWebError(message=response_msg, code=code, error_type=error_type,
marqo.errors.MarqoWebError: MarqoWebError: MarqoWebError Error message: {'message': "\nPlease create an issue on Marqo's GitHub repo (https://github.com/marqo-ai/marqo/issues) if this problem persists.", 'code': 'unhandled_backend_error', 'type': 'backend_error', 'link': ''}
status_code: 500, type: backend_error, code: unhandled_backend_error, link:

marqo container log

2023-03-26 15:37:15 INFO:     172.17.0.1:60962 - "DELETE /indexes/yogi-index HTTP/1.1" 200 OK
2023-03-26 15:37:15 INFO:     172.17.0.1:60962 - "GET /indexes/yogi-index/stats HTTP/1.1" 404 Not Found
2023-03-26 15:40:25 INFO:     172.17.0.1:60962 - "POST /indexes/yogi-index/documents?refresh=true&device=cpu&use_existing_tensors=false HTTP/1.1" 500 Internal Server Error

marqo-os container log

2023-03-26 15:37:15 [2023-03-26T22:37:15,896][INFO ][o.o.c.m.MetadataDeleteIndexService] [f9e350fd26d2] [yogi-index/JHquJ9NYQiy5OcKpJ3OQmA] deleting index
2023-03-26 15:37:15 [2023-03-26T22:37:15,949][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [f9e350fd26d2] Detected cluster change event for destination migration
2023-03-26 15:37:15 [2023-03-26T22:37:15,949][ERROR][o.o.i.i.ManagedIndexCoordinator] [f9e350fd26d2] get managed-index failed: [.opendistro-ism-config] IndexNotFoundException[no such index [.opendistro-ism-config]]
2023-03-26 15:37:15 [2023-03-26T22:37:15,997][INFO ][o.o.c.m.MetadataCreateIndexService] [f9e350fd26d2] [yogi-index] creating index, cause [api], templates [], shards [5]/[1]
2023-03-26 15:37:16 [2023-03-26T22:37:16,021][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [f9e350fd26d2] Detected cluster change event for destination migration
2023-03-26 15:37:16 [2023-03-26T22:37:16,041][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [f9e350fd26d2] Detected cluster change event for destination migration
2023-03-26 15:37:16 [2023-03-26T22:37:16,048][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [f9e350fd26d2] Detected cluster change event for destination migration
2023-03-26 15:37:16 [2023-03-26T22:37:16,058][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [f9e350fd26d2] Detected cluster change event for destination migration
2023-03-26 15:37:16 [2023-03-26T22:37:16,075][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [f9e350fd26d2] Detected cluster change event for destination migration
2023-03-26 15:40:06 [2023-03-26T22:40:06,159][INFO ][o.o.j.s.JobSweeper       ] [f9e350fd26d2] Running full sweep
2023-03-26 15:40:16 [2023-03-26T22:40:16,277][INFO ][o.o.c.m.MetadataMappingService] [f9e350fd26d2] [yogi-index/b4q1aq9DRr22BFopiIyQ1A] update_mapping [_doc]
2023-03-26 15:40:16 [2023-03-26T22:40:16,287][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [f9e350fd26d2] Detected cluster change event for destination migration
2023-03-26 15:45:06 [2023-03-26T22:45:06,162][INFO ][o.o.j.s.JobSweeper       ] [f9e350fd26d2] Running full sweep
jn2clark commented 1 year ago

hi @bazuker ! Thanks for raising the issue. I suspect the request size might be too large if those documents are that big. Can you try and send them one by one or use client_batch_size=1

jn2clark commented 1 year ago

Actually, it looks like an index not found error. Were you able to run any of the examples from the readme?

pandu-k commented 1 year ago

Was there any other stack trace in the Marqo container logs? If not you can increase the log level to debug while running Marqo here. Finally, did this problem subside when indexing smaller documents?

Also, what Marqo version and client versions are you on? You can check by running this:

import pprint
import marqo
from marqo import errors

mq = marqo.Client()
print("Marqo version information:\n", mq.get_marqo())
print("Marqo python client information:\n", marqo.supported_marqo_version())