Open MAH69IK opened 2 years ago
Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey. Please be sure to review our Code of Conduct. Also, check out some of our community resources including:
There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar. If you have additional questions, email us at saltproject@vmware.com. We’re glad you’ve joined our community and look forward to doing awesome things with you!
Maybe I can provide some more help in solving this problem?
Well, I am hitting very same issue with 2048 bytes being read to guess whether the file is binary, with cp.get_file. The issue is in salt/utils/files.py in my case:
674 def is_binary(path):
675 """
676 Detects if the file is a binary, returns bool. Returns True if the file is
677 a bin, False if the file is not and None if the file is not available.
678 """
679 if not os.path.isfile(path):
680 return False
681 try:
682 with fopen(path, "rb") as fp_:
683 try:
684 data = fp_.read(2048)
685 data = data.decode(__salt_system_encoding__)
686 return salt.utils.stringutils.is_binary(data)
687 except UnicodeDecodeError:
688 return True
689 except os.error:
690 return False
In case the file being downloaded this way looks like text in first 2048 bytes and then it's really binary, then it is being downloaded with zero length and master log is spammed with errors like:
File "/usr/lib/python3.6/site-packages/salt/utils/gitfs.py", line 3054, in serve_file data = data.decode(__salt_system_encoding__) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 23076: invalid start byte
I propose not to rely on first 2048 bytes only...
Description Well, to be precise - sometimes it works. If you're lucky. I was lucky before, so I was very surprised when adding a couple of new lines to the file broke everything.
I use
file.blockreplace
for generating the nftables configuration. I have a core - a file with rules for all servers. And usingfile.blockreplace
I insert server-specific rules. Everything worked for several weeks until I wanted to insert a couple of additional rules.I started getting this error:
And from
/var/log/salt/minion
:I started to figure it out and this is what I found. This is where the error occurred in
salt/modules/file.py
:files.get_encoding
causes an exception but why? This is function fromsalt/utils/files.py
. It uses several ways to determine the encoding of the file: checks ASCII, BOM and UTF-8.It may seem that it should work because my file is UTF-8:
But this is not the case. The fact is that for verification, this function reads the first 2048 bytes and tries to decode them (if we are talking about checking for UTF-8). And this is the only suitable method in my case because there is no BOM in my file and besides ASCII it contains comments in Russian.
It so happened that when I added new lines, when reading 2048 bytes, they ended with these characters:
\xd1\x80\xd1\x83 \xd0\xbf\xd0\xbe \xd0\xb2\xd0\xbd\xd1
. But it doesn't make sense, because in order to decode this string in UTF-8, you need to have one more byte and then you will get:\xd1\x80\xd1\x83 \xd0\xbf\xd0\xbe \xd0\xb2\xd0\xbd\xd1\x83
->ру по вну
.I offer three options. Firstly, it is possible to decode not 2048 bytes, but the entire file. Then it will work successfully.
You can not check at all, and if the user wants to replace strings in a binary file, consider that these are his problems :)
And you can rely on the work of
file
which successfully understands that this is a UTF-8 file. Here is a short python code that is based on this utility and in my case successfully works on both 2048 bytes and 2049 and on the entire file:Versions Report
salt --versions-report
```yaml $ salt --versions-report Salt Version: Salt: 3004.2 Dependency Versions: cffi: Not Installed cherrypy: Not Installed dateutil: 2.8.1 docker-py: Not Installed gitdb: 4.0.5 gitpython: 3.1.14 Jinja2: 2.11.3 libgit2: Not Installed M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.0 msgpack-pure: Not Installed mysql-python: Not Installed pycparser: Not Installed pycrypto: Not Installed pycryptodome: 3.9.7 pygit2: Not Installed Python: 3.9.2 (default, Feb 28 2021, 17:03:44) python-gnupg: Not Installed PyYAML: 5.3.1 PyZMQ: 20.0.0 smmap: 4.0.0 timelib: Not Installed Tornado: 4.5.3 ZMQ: 4.3.4 System Versions: dist: debian 11 bullseye locale: utf-8 machine: x86_64 release: 5.10.0-10-amd64 system: Linux version: Debian GNU/Linux 11 bullseye ```P. S. Near the
is_binary
function is theis_text
function, which performs a similar job and also tries to convert part of the bytes to UTF-8. That's just it reads 512 bytes. Maybe it should be brought to one size?