microsoft / go-sqlcmd

The new sqlcmd, CLI for SQL Server and Azure SQL (winget install sqlcmd / sqlcmd create mssql / sqlcmd open ads)
https://learn.microsoft.com/sql/tools/sqlcmd/go-sqlcmd-utility
MIT License
323 stars 56 forks source link

GO version of sqlcmd does not parse ANSI text files correctly #494

Open robertwmcnulty opened 6 months ago

robertwmcnulty commented 6 months ago

If a sql text file is encoded as ANSI (as opposed to UTF-8 or similar) the newer Go version of sqlcmd will not correctly parse non-ASCII characters.

For example, if a file contains non-breaking spaces (character 160), which in T-SQL is generally treated identically to a normal space. In ANSI Windows-1252, this is encoded as a single-byte hex A0.

The Go version of sqlcmd appears to assume all files are UTF encoded, for it treats such a character as unknown and replaces it with unicode character 65533, which would be consistent with assuming UTF-8 encoded, for the single byte A0 is not valid UTF-8.

The attached file is a simple example txt file encoded using the Windows notepad as ANSI, containing "SELECT{Non-breaking-space}CURRENT_TIMESTAMP"

testfile.txt

It can be run in sqlcmd with a command like: sqlcmd -i testfile.txt

The original ODBC version of sqlcmd has no problem running the above file, returning the expected timestamp.

The GO version however fails: "Could not find stored procedure 'SELECT�CURRENT_TIMESTAMP'."

The behavior of the GO sqlcmd should either match the ODBC behavior, or this should be documented as one of the "Breaking changes from sqlcmd (ODBC)" that ANSI-encoded text files are not supported.

shueybubbles commented 6 months ago

thx for opening the issue. This is related to #111 ODBC SqlCmd treats non-Unicode/non-UTF8 files as "system code page encoded" and converts them to UTF16 on read using the Win32 API MultiByteToWideChar, at least on Windows. I am not sure what their Linux version does. There's not much support in the Go dev community for code pages and we encourage folks who develop cloud-first applications that run on Linux etc to use UTF8 or UTF16 encoded files instead of relying on ambient properties like the system code page.

I do want to support the code page conversions but we just haven't had the time to do the work yet. I will update the README appropriately.

shueybubbles commented 6 months ago

this content is relevant for ODBC SqlCmd on Linux and may guide our implementation. I don't know offhand what the Go method to detect "current locale" is.

https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/programming-guidelines?view=sql-server-ver16#character-set-support