[BUG] file.blockreplace does not work with UTF-8

MAH69IK commented 2 years ago

Description Well, to be precise - sometimes it works. If you're lucky. I was lucky before, so I was very surprised when adding a couple of new lines to the file broke everything.

I use file.blockreplace for generating the nftables configuration. I have a core - a file with rules for all servers. And using file.blockreplace I insert server-specific rules. Everything worked for several weeks until I wanted to insert a couple of additional rules.

I started getting this error:

    Function: file.blockreplace
        Name: /etc/nftables.conf
      Result: False
     Comment: Encountered error managing block: Cannot perform string replacements on a binary file: /etc/nftables.conf. See the log for details.

And from /var/log/salt/minion:

2022-06-24 16:09:51,014 [salt.loaded.int.states.file:5756][ERROR   ][4021129] Encountered error managing block
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/salt/states/file.py", line 5741, in blockreplace
    changes = __salt__["file.blockreplace"](
  File "/usr/lib/python3/dist-packages/salt/loader/lazy.py", line 149, in __call__
    return self.loader.run(run_func, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/salt/loader/lazy.py", line 1201, in run
    return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/salt/loader/lazy.py", line 1216, in _run_as
    return _func_or_method(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/salt/modules/file.py", line 2860, in blockreplace
    raise SaltInvocationError(
salt.exceptions.SaltInvocationError: Cannot perform string replacements on a binary file: /etc/nftables.conf
2022-06-24 16:09:51,021 [salt.state       :317 ][ERROR   ][4021129] Encountered error managing block: Cannot perform string replacements on a binary file: /etc/nftables.conf. See the log for details.

I started to figure it out and this is what I found. This is where the error occurred in salt/modules/file.py:

    try:
        file_encoding = __utils__["files.get_encoding"](path)
    except CommandExecutionError:
        file_encoding = None

    if __utils__["files.is_binary"](path):
        if not file_encoding:
            raise SaltInvocationError(
                "Cannot perform string replacements on a binary file: {}".format(path)
            )

files.get_encoding causes an exception but why? This is function from salt/utils/files.py. It uses several ways to determine the encoding of the file: checks ASCII, BOM and UTF-8.

It may seem that it should work because my file is UTF-8:

$ file /srv/salt/base/packages/nftables/nftables.conf 
/srv/salt/base/packages/nftables/nftables.conf: a /usr/sbin/nft -f script, UTF-8 Unicode text executable

But this is not the case. The fact is that for verification, this function reads the first 2048 bytes and tries to decode them (if we are talking about checking for UTF-8). And this is the only suitable method in my case because there is no BOM in my file and besides ASCII it contains comments in Russian.

It so happened that when I added new lines, when reading 2048 bytes, they ended with these characters: \xd1\x80\xd1\x83 \xd0\xbf\xd0\xbe \xd0\xb2\xd0\xbd\xd1. But it doesn't make sense, because in order to decode this string in UTF-8, you need to have one more byte and then you will get: \xd1\x80\xd1\x83 \xd0\xbf\xd0\xbe \xd0\xb2\xd0\xbd\xd1\x83 -> ру по вну.

I offer three options. Firstly, it is possible to decode not 2048 bytes, but the entire file. Then it will work successfully.

You can not check at all, and if the user wants to replace strings in a binary file, consider that these are his problems :)

And you can rely on the work of file which successfully understands that this is a UTF-8 file. Here is a short python code that is based on this utility and in my case successfully works on both 2048 bytes and 2049 and on the entire file:

>>> textchars = bytearray({7,8,9,10,12,13,27} | set(range(0x20, 0x100)) - {0x7f})
>>> is_binary_string = lambda bytes: bool(bytes.translate(None, textchars))
>>> with open('/srv/salt/base/packages/nftables/nftables.conf', 'rb') as fp_:
...     data = fp_.read(2048)
>>> is_binary_string(data)
False

Versions Report

salt --versions-report

```yaml $ salt --versions-report Salt Version: Salt: 3004.2 Dependency Versions: cffi: Not Installed cherrypy: Not Installed dateutil: 2.8.1 docker-py: Not Installed gitdb: 4.0.5 gitpython: 3.1.14 Jinja2: 2.11.3 libgit2: Not Installed M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.0 msgpack-pure: Not Installed mysql-python: Not Installed pycparser: Not Installed pycrypto: Not Installed pycryptodome: 3.9.7 pygit2: Not Installed Python: 3.9.2 (default, Feb 28 2021, 17:03:44) python-gnupg: Not Installed PyYAML: 5.3.1 PyZMQ: 20.0.0 smmap: 4.0.0 timelib: Not Installed Tornado: 4.5.3 ZMQ: 4.3.4 System Versions: dist: debian 11 bullseye locale: utf-8 machine: x86_64 release: 5.10.0-10-amd64 system: Linux version: Debian GNU/Linux 11 bullseye ```

P. S. Near the is_binary function is the is_text function, which performs a similar job and also tries to convert part of the bytes to UTF-8. That's just it reads 512 bytes. Maybe it should be brought to one size?

welcome[bot] commented 2 years ago

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey. Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar. If you have additional questions, email us at saltproject@vmware.com. We’re glad you’ve joined our community and look forward to doing awesome things with you!

MAH69IK commented 1 year ago

Maybe I can provide some more help in solving this problem?

onlineque commented 1 year ago

Well, I am hitting very same issue with 2048 bytes being read to guess whether the file is binary, with cp.get_file. The issue is in salt/utils/files.py in my case:

   674  def is_binary(path):
   675      """
   676      Detects if the file is a binary, returns bool. Returns True if the file is
   677      a bin, False if the file is not and None if the file is not available.
   678      """
   679      if not os.path.isfile(path):
   680          return False
   681      try:
   682          with fopen(path, "rb") as fp_:
   683              try:
   684                  data = fp_.read(2048)
   685                  data = data.decode(__salt_system_encoding__)
   686                  return salt.utils.stringutils.is_binary(data)
   687              except UnicodeDecodeError:
   688                  return True
   689      except os.error:
   690          return False

In case the file being downloaded this way looks like text in first 2048 bytes and then it's really binary, then it is being downloaded with zero length and master log is spammed with errors like: File "/usr/lib/python3.6/site-packages/salt/utils/gitfs.py", line 3054, in serve_file data = data.decode(__salt_system_encoding__) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 23076: invalid start byte

I propose not to rely on first 2048 bytes only...

saltstack / salt

[BUG] file.blockreplace does not work with UTF-8 #62214