Open victoraugustolls opened 4 years ago
Is there a repro app? Does it happen during a particular query?
No particular query, just any query at random, but I can post my connection class and Dockerfile to help!
Connection class
import os
import pyodbc
from opencensus.trace import execution_context, status
from helpers import Formatter, Singleton
from helpers.config import MSSQL
from logger import new
LOG = new(__name__)
class MsSQLService(metaclass=Singleton):
def __init__(self):
try:
tracer = execution_context.get_opencensus_tracer()
except Exception:
LOG.error("Failed to trace request", exc_info=True)
raise
try:
with tracer.span(name=MSSQL["server_name"]) as _:
tracer.add_attribute_to_current_span("dependency.type", "SQL")
self.conn = pyodbc.connect(
(
"Driver={{ODBC Driver 17 for SQL Server}};Server={};"
"Database={};UID={};PWD={};"
).format(
MSSQL["server_name"],
MSSQL["database"],
MSSQL["username"],
MSSQL["password"],
),
autocommit=True,
)
except Exception as e:
LOG.error(f"Connection exception: {e}")
raise
def execute_query(self, query, formatted=True):
try:
tracer = execution_context.get_opencensus_tracer()
except Exception:
LOG.error("Failed to get tracer", exc_info=True)
raise
with tracer.span(name=MSSQL["server_name"]) as _:
tracer.add_attribute_to_current_span("dependency.type", "SQL")
tracer.add_attribute_to_current_span("database.query", query)
try:
cur = self.conn.cursor()
cur.execute(query)
if formatted is True:
data = Formatter.format_cursor_mssql(cur)
else:
data = None
except Exception as e:
LOG.info(f"MSSQL error: {e}")
LOG.info(f"MSSQL error cause: {e.__cause__}")
LOG.info(f"MSSQL error dict: {e.__dict__}")
span = tracer.current_span()
span.set_status(status=status.Status.from_exception(e))
raise
finally:
cur.close()
return data
Dockerfile
FROM python:3.7-slim-stretch
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
apt-utils \
apt-transport-https \
debconf-utils \
build-essential \
gcc \
gnupg \
locales && \
echo 'en_US.UTF-8 UTF-8' > /etc/locale.gen && \
locale-gen && \
# adding custom MS and PostgreSQL repository
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - && \
curl https://packages.microsoft.com/config/debian/9/prod.list > /etc/apt/sources.list.d/mssql-release.list && \
echo 'deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main' >> /etc/apt/sources.list.d/pgdg.list && \
curl https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - && \
# install SQL Server and PostgreSQL drivers
apt-get update && ACCEPT_EULA=Y apt-get install -y --no-install-recommends \
msodbcsql17 \
mssql-tools \
unixodbc-dev \
libpq-dev \
postgresql-client-10 && \
echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bash_profile && \
echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc && \
/bin/bash -c "source ~/.bashrc" && \
# clear installations
apt-get remove -y apt-utils apt-transport-https build-essential curl gnupg && \
rm -rf /var/lib/apt/lists/*
ENV PYTHONIOENCODING=utf-8
RUN mkdir -p /root/.opencensus/.azure/gunicorn
ADD requirements.txt .
RUN pip install -r requirements.txt
RUN mkdir /code
WORKDIR /code
ADD src /code/
ADD . /code/
COPY init.sh /usr/local/bin/
RUN chmod u+x /usr/local/bin/init.sh
ENTRYPOINT ["init.sh"]
I can also add that doing:
self.conn.setencoding(encoding="utf-8")
self.conn.setdecoding(pyodbc.SQL_CHAR, encoding="utf-8")
self.conn.setdecoding(pyodbc.SQL_WCHAR, encoding="utf-8")
did not fix the issue
Doing self.conn.setdecoding(pyodbc.SQL_WMETADATA, encoding="utf-8")
seems to fix BUT all the columns name are missing characters now :/
Please post an ODBC trace. It is hard to determine what is happening without a trace.
https://github.com/mkleehammer/pyodbc/issues/489
I see this exact error randomly instead of the garbled messages in issue 489. I believe this is a result of getting an error message half full of arbitrary characters and attempting to decode it. SqlGetDiagRecW seems to be the issue here too.
I'm trying to reproduce this, but I can't pip3 install Formatter
because of the missing cStringIO
It looks to me like Formatter is a class in his local helper lib. When you run pip install Formatter it fails as formatter was only compat with Python 2.
cStringIO was deprecated with python 3. https://stackoverflow.com/questions/57472080/modulenotfounderror-no-module-named-cstringio
formatter itself the package on pypi would also not make sense in the context for which he was using it. https://pypi.org/project/formatter/ It is a code formatter like Black.
Further in reviewing...
Since as Victor mentions:
self.conn.setdecoding(pyodbc.SQL_WMETADATA, encoding="utf-8")
Fixes the issue, I doubly think this is related to #489
The reason being if you half the size of a utf character then you will only pull half of the number of characters on err. Half is exactly the issue with SqlGetDiagRecW.
Also he mentions it's random. The system error which is thrown occurs randomly as the characters that are pulled (when they shouldn't be) and ran through unicode_decode are random.
So, randomly unicode_decode gets a character it can't decode and throws the system error instead of a pyodbc.error.
If I can find a Docker instance for Oracle, I can setup a reproducible test case for you, but I don't use Docker regularly, so it may be a bit before I can get this done.
Finally, can we get some motion on #489? Would a PR to repair #489 be more well received if I was able to provide a reproducible test case that proves the two are related?
Hi! Sorry for not replying earlier. About the formatter, this is what I'm using:
@staticmethod
def format_cursor_mssql(cursor):
resp_list = []
names_list = [
Formatter.camel_case_name(value[0]) for value in cursor.description
]
for row in cursor:
resp_list.append(dict(zip(names_list, row)))
return Formatter.clear_nulls(resp_list)
@staticmethod
def camel_case_name(name):
return re.sub("_.", lambda x: x.group()[1].upper(), name)
@staticmethod
def clear_nulls(dict_list):
for obj in dict_list:
for k, v in obj.items():
if isinstance(v, str) and v.lower() == "null":
obj[k] = None
return dict_list
I'm getting the same error and I'm also using SQLServer (latest version on RDS) and Python 3.10. It's happening to me in Django. Seems to happen to me when the server is under a bit of strain.
My issue was not resolved by trying to set the decoding to utf-8 in my options, as described here:
DATABASES = {
'default': {
'ENGINE': 'mssql',
'NAME': os.environ['METASTORE_DATABASE'],
'USER': os.environ['METASTORE_USERNAME'],
'PASSWORD': os.environ['METASTORE_PASSWORD'],
'HOST': os.environ['METASTORE_SERVER_URL'],
'OPTIONS': {
'driver': 'ODBC Driver 17 for SQL Server',
"setdecoding": [
{"sqltype": pyodbc.SQL_CHAR, "encoding": 'utf-8'},
{"sqltype": pyodbc.SQL_WCHAR, "encoding": 'utf-8'}],
"setencoding": [
{"encoding": "utf-8"}],
},
},
}
I've just seen the same issue, airflow 2.2.3, odbc 3.100, pyodbc 4.0.34, python 3.10.5, on wsl ubuntu. MS SQL Server v12.0.2000.8.
Any update on a solution? Issue has been open a couple of years.
Is self.conn.setdecoding(pyodbc.SQL_WMETADATA, encoding="utf-8")
the only metadata setting worth forcing to utf-8 so the underlying issue can get logged properly or are there others?
I'm not sure if it was mentioned above, but when I first saw this in an apache airflow task, it just hung the task for an hour until it was noticed.
Edit:
I ran SELECT CONVERT (varchar(256), SERVERPROPERTY('collation'));
which gave SQL_Latin1_General_CP1_CI_AS
which according to a quick google means:
latin1: makes the server treat strings using charset latin 1, basically ascii CP1: stands for Code Page 1252 CI: case insensitive comparisons so 'ABC' would equal 'abc' AS: accent sensitive, so 'ü' does not equal 'u'
As such it feels like setting a decoding to utf-8 is not the right thing to do as surely this is just negotiated between the db and the odbc client?
There are several different places where encodings can be converted, and thus there is also the possibility that the encoding is different in different parts of the system. The pyODBC encoding options are used to set the encoding used between pyODBC and the ODBC driver. You are using MSSQL but I am not sure what "odbc 3.100" is; in any case, the ODBC Driver for SQL Server usually defaults to UTF-8 for its narrow string encoding, although later versions will detect the environment encoding (which is usually UTF-8 on Linux) and use that.
If you would like a better investigation of this issue then post more details about the data which is causing it, along with an ODBC trace and short self-contained repro code.
Also, 4.0.34 version of pyODBC has a major problem with including its own old unixODBC which contains much bugs, use '32 for the time being.
Also, 4.0.34 version of pyODBC has a major problem with including its own old unixODBC which contains much bugs, use '32 for the time being.
I don't remember this. Has this been fixed in 4.0.35?
The issue of unixODBC being included in the release wheels was indeed fixed in 4.0.35. Note the customized use of CIBW_REPAIR_WHEEL_COMMAND options in the build script: https://github.com/mkleehammer/pyodbc/blob/master/.github/workflows/artifacts_build.yml#L56-L72
Environment
To diagnose, we usually need to know the following, including version numbers. On Windows, be sure to specify 32-bit Python or 64-bit:
Running flask under gunicorn gevent worker
Issue
Randomly I receive the following error:
Tried reversing to
4.0.23
and still no good!