Closed hb2638 closed 2 years ago
I can replicate the two test failures on Linux (Xubuntu 20.04). FWIW, those tests pass if I change
row = self.cursor.execute("DECLARE @OUTPUT VARCHAR(MAX)= …
to
row = self.cursor.execute("DECLARE @OUTPUT NVARCHAR(MAX)= …
@hb2638 - If you make that change on Windows does it cause those tests to start failing?
On Windows the narrow charset is usually CP1252 or similar (depends on the language etc.) but on Linux it's usually UTF-8. That may explain the differences you're seeing.
On Windows the narrow charset is usually CP1252 or similar (depends on the language etc.) but on Linux it's usually UTF-8. That may explain the differences you're seeing.
I guess the issue is in the Microsoft SQL ODBC driver.. That's the only thing that's different between Linux and Windows. I couldn't find anything specific to Windows in the pyodbc code.
Do you know if I can change the encoding to latin1 iso-8859-1 in pyodbc? I see there's a setdecoding and setencoding but I can't find any documentation on it.
There is unlikely to be an issue. The driver uses the environment encoding; this is the same behaviour as on Windows. You can change the environment encoding in Linux, but you may need to install CP1252 first - see https://ereimer.net/programs/charsets-cp1252-utf8.htm for more information.
I can replicate the two test failures on Linux (Xubuntu 20.04). FWIW, those tests pass if I change
row = self.cursor.execute("DECLARE @OUTPUT VARCHAR(MAX)= …
to
row = self.cursor.execute("DECLARE @OUTPUT NVARCHAR(MAX)= …
@hb2638 - If you make that change on Windows does it cause those tests to start failing?
The NVARCHAR works on linux for me too.
There is unlikely to be an issue. The driver uses the environment encoding; this is the same behaviour as on Windows. You can change the environment encoding in Linux, but you may need to install CP1252 first - see https://ereimer.net/programs/charsets-cp1252-utf8.htm for more information.
Thx! I'll give that a try. I'll close this because I feel more confident that the issue is in the driver.
I was able to get the localce CP1252 locale installed but I still couldn't get the strings to tie out, so I ended up using "FOR JSON" to make SQL do the UTF8 conversion before sending it over the wire
def test_output_json(self):
row = self.cursor.execute("DECLARE @OUTPUT VARCHAR(MAX)= CONCAT(?, '' COLLATE SQL_Latin1_General_CP1_CI_AS);SELECT * FROM (SELECT @OUTPUT AS OUTPUT) TMP FOR JSON AUTO", [EXPECTED]).fetchone()
actual = json.loads(row[0])[0]["OUTPUT"]
expected_windows = '5c22232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e2060c281e2809ac692e2809ee280a6e280a0e280a1cb86e280b0c5a0e280b9c592c28dc5bdc28fc290e28098e28099e2809ce2809de280a2e28093e28094cb9ce284a2c5a1e280bac593c29dc5bec5b820c2a1c2a2c2a3c2a4c2a5c2a6c2a7c2a8c2a9c2aac2abc2acc2adc2aec2afc2b0c2b1c2b2c2b3c2b4c2b5c2b6c2b7c2b8c2b9c2bac2bbc2bcc2bdc2bec2bfc380c381c382c383c384c385c386c387c388c389c38ac38bc38cc38dc38ec38fc390c391c392c393c394c395c396c397c398c399c39ac39bc39cc39dc39ec39fc3a0c3a1c3a2c3a3c3a4c3a5c3a6c3a7c3a8c3a9c3aac3abc3acc3adc3aec3afc3b0c3b1c3b2c3b3c3b4c3b5c3b6c3b7c3b8c3b9c3bac3bbc3bcc3bdc3bec3bf'
self.assertEqual(expected_windows, actual.encode().hex())
I can replicate the two test failures on Linux (Xubuntu 20.04). FWIW, those tests pass if I change
row = self.cursor.execute("DECLARE @OUTPUT VARCHAR(MAX)= …
to
row = self.cursor.execute("DECLARE @OUTPUT NVARCHAR(MAX)= …
@hb2638 - If you make that change on Windows does it cause those tests to start failing?
For the record, the original tests as modified above also passed on Windows.
Hi @gordthompson Switching from VARCHAR to NVARCHAR works in both OSes.
Please first make sure you have looked at:
Environment
To diagnose, we usually need to know the following, including version numbers. On Windows, be sure to specify 32-bit Python or 64-bit:
Issue
All the tests below pass on windows but test_bytes_to_str and test_str fail on Linux and I can't explain why. Do you have any ideas why it's failing on Linux but not Windows? Is it something in Python? PyOdbc? The OS? The driver? A lot of moving parts :^(
I'm hoping for a consistent behavior between the two... And I know... VARCHAR is not unicode.