Closed Raghav-254 closed 1 year ago
Could you please provide DDL for the table with some sample data?
Hi @hovaesco This is being observed with varbinary datatype values only. Can you please now have a look into this? Regards, Raghav Mittal
It looks like a binary value is not properly encoded as UTF-8. Please make sure you're using the X
prefix when inserting and the data is properly formatted as hexadecimal string.
It is working fine with trino-cli as well as trino JDBC driver and with the trino-python-client v0.306.0 as well. It seems to be broken in the latest version of v0.322.0. Therefore probably it should not be the issue with the dataset.
Hi do you require any extra info from our side?
Can you provide a literal varbinary string that produces the issue in the form of SELECT X'<my binary string>'
?
Also note that \x<binary>
just refers to characters that are not printable in UTF8, it is not necessarily wrong.
Hi @mdesmet @hovaesco
You can follow the below process to reproduce the above mentioned issue
Create the table with id
column having the varbinary datatype.
CREATE TABLE IF NOT EXISTS test_data ( testid VARBINARY );
Insert values into the table
INSERT INTO test_data (testid) VALUES (CAST('0123456789ABCDEF' AS VARBINARY));
INSERT INTO test_data (testid) VALUES (CAST('x0123456789ABCDEF' AS VARBINARY));
Query the table using select statement.
select testid from test_data
Following is the output I am getting while running via trino-python-client v0.322.0
Output I am getting while running via trino-python-client v0.306.0
Also I found that, it starts breaking from v0.321.0. Upto v.0.320.0 it is working fine. So you can also have a look at the changes which you have made while bumping up the version from v0.320.0 to v.0.321.0
Regards, Raghav Mittal
In older versions results were always returned as strings. In newer versions they're mapped to Python types.
The values are equivalent and this is not a bug:
>>> x = b'0123456789ABCDEF'
>>> base64.b64encode(x)
b'MDEyMzQ1Njc4OUFCQ0RFRg=='
You can revert to old behaviour by enabling https://github.com/trinodb/trino-python-client#legacy-primitive-types.
You can also read the release notes when upgrading - https://github.com/trinodb/trino-python-client/blob/master/CHANGES.md#breaking-changes
Hi @mdesmet @hashhar
Thanks for your response.
I was able to render the strings correctly by passing this legacy_primitive_types
argument as True
.
But after doing this, I lost the behavior of displaying row types as dict. The changes which you have made in this PR: https://github.com/trinodb/trino-python-client/pull/335
Basically I patched the above changes of your PR and then tried on v0.322.0 by making legacy_primitive_types
as True
.
Can you please suggest me the resolution of this issue? Regards, Raghav Mittal
You can either use results as strings or results as Python types, not both. The string behaviour is only there to allow people to migrate to use Python types. I'd suggest you spend the time to either base64 decode the byte-array you get for VARBINARY columns instead of trying to hack around it.
ok thanks for your response, lemme try that out.
Hi @hashhar
FYI this thing worked for me, you can refer the try except block in the below code.
I made changes in the client.py
file under class NamedRowTuple
class NamedRowTuple(tuple):
"""Custom tuple class as namedtuple doesn't support missing or duplicate names"""
def __new__(cls, values, names: List[str], types: List[str]):
return super().__new__(cls, values)
def __init__(self, values, names: List[str], types: List[str]):
self._names = names
# With names and types users can retrieve the name and Trino data type of a row
self.__annotations__ = dict()
self.__annotations__["names"] = names
self.__annotations__["types"] = types
elements: List[Any] = []
for name, value in zip(names, values):
if names.count(name) == 1:
setattr(self, name, value)
try:
if isinstance(value, (bytes, bytearray)):
value = base64.b64encode(value).decode('utf-8')
except Exception as e:
pass
elements.append(f"{name}: {repr(value)}")
else:
elements.append(repr(value))
self._repr = "(" + ", ".join(elements) + ")"
I didn't mean for you to change the client code itself. I meant you can modify your script/code to become aware of the fact that VARBINARY columns will be Python byte-arrays and handle them accordingly.
There is no change to be done in the client itself.
ok thanks for your clarification!
Expected behavior
The output should handle the special characters properly like below:
Actual behavior
I was querying one table, and the output doesn't seem to be correct.
Basically the special characters are not being handled properly. This is specific to varbinary datatype.
Steps To Reproduce
This issue was encountered when I upgraded the trino-python-client from v0.306.0 to v0.322.0
You can reproduce this issue by running the below python script:
Log output
Getting these log outputs as well:
Unsupported UTF-8 sequence length when encoding string
'utf-8' codec can't decode byte 0xff in position 337: invalid start byte
Operating System
NA
Trino Python client version
0.322.0
Trino Server version
NA
Python version
3.10
Are you willing to submit PR?