JSON serializer cannot serialize UUID values

LaundroMat commented 3 months ago

Description When adding a document containing a UUID value, the library's JSON serializer fails.

Expected behavior That the internal JSON serializer converts UUID values to str values.

Current behavior

>>> list(payload)
[{'id': UUID('a3503c63-4244-47ee-adab-1dc80fc20265')]
>>> index.add_documents(list(payload))
Traceback (most recent call last):
  File "xxx\.venv\Lib\site-packages\meilisearch\index.py", line 428, in add_documents
    add_document_task = self.http.post(url, documents)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxx\.venv\Lib\site-packages\meilisearch\_httprequests.py", line 85, in post
    return self.send_request(requests.post, path, body, content_type)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxx\.venv\Lib\site-packages\meilisearch\_httprequests.py", line 65, in send_request
    data=json.dumps(body) if body else "" if body == "" else "null",
         ^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\json\__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\json\encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\json\encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\json\encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type UUID is not JSON serializable

Environment (please complete the following information):

OS: Windows 10
Meilisearch version: 1.8.1
meilisearch-python version: v0.31.2

My current workaround uses orjson which can serialize UUID fields:

payload = orjson.loads(orjson.dumps(list(payload)))

but it would be better if this workaround is unnecessary.

sanders41 commented 3 months ago

We don't have any way to know which fields will be a uuids so the only way to do this would be to go though every field of every document and check it. That is going to be a decent performance hit for something that isn't going to be very common.

Since you will know which fields will be a uuid you could convert them yourself before sending it.

my_docs = [{"uuid_field": UUID4(), "title": "Some Title"}]
for doc in my_docs:
    doc["uuid_field"] = str(doc["uuid_field"])

LaundroMat commented 3 months ago

You wouldn't have to go through each field if you were using orjson, but I completely understand you don't want that extra dependency. Maybe I should add this as a suggestion issue, but have custom serializers ever been considered? Something like:

index.add_documents(list(payload), serializer=MyCustomJSONSerializer)

sanders41 commented 3 months ago

This isn't something we have discussed, but I wouldn't rule it out as an option. While not everyone will have uuids it's not so niche that you are going to be the only person doing it.

LaundroMat commented 3 months ago

Thanks, it works great! I use DjangoJSONEncoder as a custom encoder and it works flawlessly.

meilisearch / meilisearch-python

JSON serializer cannot serialize UUID values #973