Open robertwmcnulty opened 11 months ago
thx for opening the issue. This is related to #111
ODBC SqlCmd treats non-Unicode/non-UTF8 files as "system code page encoded" and converts them to UTF16 on read using the Win32 API MultiByteToWideChar
, at least on Windows. I am not sure what their Linux version does.
There's not much support in the Go dev community for code pages and we encourage folks who develop cloud-first applications that run on Linux etc to use UTF8 or UTF16 encoded files instead of relying on ambient properties like the system code page.
I do want to support the code page conversions but we just haven't had the time to do the work yet. I will update the README appropriately.
this content is relevant for ODBC SqlCmd on Linux and may guide our implementation. I don't know offhand what the Go method to detect "current locale" is.
If a sql text file is encoded as ANSI (as opposed to UTF-8 or similar) the newer Go version of sqlcmd will not correctly parse non-ASCII characters.
For example, if a file contains non-breaking spaces (character 160), which in T-SQL is generally treated identically to a normal space. In ANSI Windows-1252, this is encoded as a single-byte hex A0.
The Go version of sqlcmd appears to assume all files are UTF encoded, for it treats such a character as unknown and replaces it with unicode character 65533, which would be consistent with assuming UTF-8 encoded, for the single byte A0 is not valid UTF-8.
The attached file is a simple example txt file encoded using the Windows notepad as ANSI, containing "SELECT{Non-breaking-space}CURRENT_TIMESTAMP"
testfile.txt
It can be run in sqlcmd with a command like: sqlcmd -i testfile.txt
The original ODBC version of sqlcmd has no problem running the above file, returning the expected timestamp.
The GO version however fails: "Could not find stored procedure 'SELECT�CURRENT_TIMESTAMP'."
The behavior of the GO sqlcmd should either match the ODBC behavior, or this should be documented as one of the "Breaking changes from sqlcmd (ODBC)" that ANSI-encoded text files are not supported.