runtimeverification / llvm-backend

KORE to llvm translation
BSD 3-Clause "New" or "Revised" License
34 stars 19 forks source link

pybind11 raises a `UnicodeDecodeError` on non-utf bytes in terms of sort `Bytes` #1078

Open gtrepta opened 1 month ago

gtrepta commented 1 month ago

Terms of sort Bytes and String are both stored in a kore_string_pattern in the AST library and treated the same way when being accessed from the bindings:

https://github.com/runtimeverification/llvm-backend/blob/2983a01dccf1d278aaea3c9c1d989c5273eaab55/bindings/python/ast.cpp#L360-L363

The issue here is when the contents property is accessed, pybind assumes it's a valid utf encoded string. This isn't always the case for Bytes terms, though, and an exception gets thrown in that case.

Pybind does support returning an unconverted string, so we should find out how to do that for terms that need to be treated that way.

https://pybind11.readthedocs.io/en/stable/advanced/cast/strings.html#returning-c-strings-to-python

tothtamas28 commented 1 month ago

Possibly a duplicate: