snowflakedb / snowflake-cli

Snowflake CLI is an open-source command-line tool explicitly designed for developer-centric workloads in addition to SQL operations.
https://docs.snowflake.com/developer-guide/snowflake-cli-v2/index
Apache License 2.0
184 stars 54 forks source link

SNOW-1528909: Snowflake CLI cannot handle UTF-16LE encoded text files #1303

Open sfc-gh-cgorrie opened 4 months ago

sfc-gh-cgorrie commented 4 months ago

SnowCLI version

2.6.0rc0

Python version

Python 3.11.9

Platform

macOS-14.5-arm64-arm-64bit

What happened

Powershell redirects (e.g. command > file) by default encode output using UTF-16LE. Unfortunately, Snowflake CLI in a lot of paths is assuming utf-8 encoding, which makes common workflows fail there. Here's an example PR that simply changes the input for a snow sql -f command to use that encoding, showing the failure: #1299

Console output

src/snowflake/cli/api/commands/snow_typer.py:96: in command_callable_decorator
    result = command_callable(*args, **kw)
src/snowflake/cli/api/commands/decorators.py:158: in wrapper
    return func(**options)
src/snowflake/cli/api/commands/decorators.py:158: in wrapper
    return func(**options)
src/snowflake/cli/plugins/sql/commands.py:82: in execute_sql
    single_statement, cursors = SqlManager().execute(query, files, std_in, data=data)
src/snowflake/cli/plugins/sql/manager.py:60: in execute
    query_from_file = SecurePath(file).read_text(
src/snowflake/cli/api/secure_path.py:157: in read_text
    return self._path.read_text(*args, **kwargs)
/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/pathlib.py:1059: in read_text
    return f.read()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <encodings.utf_8.IncrementalDecoder object at 0x7f7815a95f90>
input = b'\xff\xfe/\x00*\x00\n\x00 \x00C\x00o\x00p\x00y\x00r\x00i\x00g\x00h\x00t\x00 \x00(\x00c\x00)\x00 \x002\x000\x002\x004\...00e\x00c\x00t\x00 \x00r\x00o\x00u\x00n\x00d\x00(\x00l\x00n\x00(\x001\x000\x000\x00)\x00,\x00 \x004\x00)\x00;\x00\n\x00'
final = True

>   ???
E   UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

<frozen codecs>:322: UnicodeDecodeError


### How to reproduce

1. Encode a file using UTF-16LE
2. Use it as `snowflake.yml`, as a post-deploy hook, or as an input to `snow sql -f`
3. Observe a utf-8 codec error
sfc-gh-turbaszek commented 4 months ago

We may need to use a tool like https://github.com/jawah/charset_normalizer

sfc-gh-cgorrie commented 4 months ago

I think we could get away with something a little lighter-weight and more deterministic. BOM detection alone will solve the standard codepath for Windows, and if we give users the ability to use (python-standard? *nix locale?) env vars to match any overrides they've made on their local system, that coverage should be enough to resolve this ticket.