Open grieve54706 opened 3 days ago
You can achieve this using the default_handler
in df.to_json
.
pd.DataFrame({"uuid": [uuid.uuid4()]}).to_json(default_handler=str)
.
Also the stdlib json library (pandas uses a vendored version of ujson iirc) also doesn't serialize uuids natively.
import json
import uuid
json.dumps({"a": uuid.uuid4()}) # raises with TypeError: Object of type UUID is not JSON serializable
json.dumps({"a": uuid.uuid4()}, default=str) # works
@grieve54706 The issue you are encountering arises because pandas does not natively support serialization of uuid.UUID instances to JSON. When you attempt to serialize a DataFrame containing UUID objects using to_json(), it results in encoding errors. to produce the expected behaviour u can try out this:- convert the UUID objects to their string representations before serializing the DataFrame to JSON.
import uuid
import pandas as pd
# Create a DataFrame with a UUID column
df = pd.DataFrame({"uuid": [uuid.uuid4()]})
# Convert UUID objects to strings
df['uuid'] = df['uuid'].astype(str)
# Serialize the DataFrame to JSON
json_data = df.to_json()
print(json_data)
Plz let me know if the above works thanks
Thanks, guys. I think your suggestions all work.
I provide a tool to connect many databases and put the data into pandas for other people, so I will not know which column is UUID and databases have different data that could be dtype Object too. I found the orjson serializes UUID to the string by default. Curious, pandas to JSON should follow RFC 4122 too?
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
If the DataFrame is with UUID, it will fail when to JSON. And raise the error with the message
Unsupported UTF-8 sequence length when encoding string
orUnicodeDecodeError: 'utf-8' codec can't decode byte 0xa1 in position 183: invalid start byte
.Expected Behavior
It should serialize
uuid.UUID
instances to RFC 4122 format, e.g.,f81d4fae-7dec-11d0-a765-00a0c91e6bf6
.Installed Versions