Open dhs-rec opened 5 years ago
Might even be possible to determine the encoding automatically. At least the file
command on Linux is able to do so:
% file PSWindowsUpdate.psd1
PSWindowsUpdate.psd1: Little-endian UTF-16 Unicode text, with very long lines, with CRLF line terminators
The following, slightly modified version of above sample code does this, using libmagic Python bindings (aka. python3-magic):
#!/usr/bin/python3
import magic
import re
file = 'PSWindowsUpdate.psd1'
# Open file in binary mode and detect encoding
text = open(file, 'rb').read()
m = magic.open(magic.MAGIC_MIME_ENCODING)
m.load()
encoding = m.buffer(text)
# Read and write file with correct encoding
text = open(file, 'r', encoding=encoding).read()
with open(file + '.new', 'w', encoding=encoding) as f:
for line in text.splitlines():
if re.match('^PowerShellVersion\s+=\s+.*', line):
f.write('PowerShellVersion = \'3.0\'\r\n')
else:
f.write(line + '\r\n')
I, for one, would like to overhaul a lot of how we handle file/stream encodings, with the ability to specify all the things (and perhaps default to utf-8, if possible).
Sounds good, although I'd consider this a bug.
Anyway, here's a version of the sample code using chardet, which might be more platform independent:
#!/usr/bin/python3
import chardet
import re
file = 'PSWindowsUpdate.psd1'
# Detect character encoding
charenc = chardet.detect(open(file, 'rb').read())['encoding']
print(charenc)
# Read file and detect linefeed type
with open(file, 'r', encoding=charenc) as f:
text = f.read()
lf = f.newlines
# Write new file, replacing some text
with open(file + '.new', 'w', encoding=charenc, newline=lf) as f:
for line in text.splitlines():
if re.match('^PowerShellVersion\s+=\s+.*', line):
f.write('PowerShellVersion = \'3.0\'\n')
else:
f.write(line + '\n')
Any news on this bug?
Nothing currently - we're currently pushing hard to stabilize our test suites/pipeline, but this is definitely on my radar.
My personal opinion is that Salt should never assume encoding (though right now we assume utf-8, if we assume anything). I think I'd welcome a PR that adds an encoding
argument to file.replace
, if it doesn't require a more invasive change. Otherwise I think we should open a SEP detailing the work required/risks to updating the way Salt handles encoding things.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.
Ping.
Thank you for updating this issue. It is no longer marked as stale.
file.managed seems to have a similar issue, trying to manage a file encoded utf-16 LE (xml export from Windows Task Scheduler), added some Jinja to templatize the file, but is treated as binary. not replacing jinja by defaults values
get_xml:
file.managed:
- name: 'C:\task.xml'
- source: salt://win_tasks/some_template.txt
- template: jinja
# - encoding: tried with utf16, utf16-le, utf-16-le
- defaults:
TimeTrigger_StartBoundary: {{ TimeTrigger_StartBoundary }}
TimeTrigger_StopBoundary: {{ TimeTrigger_StopBoundary }}
some_template.txt (UTF-16LE encoded)
<StartBoundary>{{ TimeTrigger_StartBoundary }}</StartBoundary>
<EndBoundary>{{ TimeTrigger_StopBoundary }}</EndBoundary>
<ExecutionTimeLimit>PT30M</ExecutionTimeLimit>
This causes the file to be treated as binary and Jinja does not get replaced. The file looks exactly like original
minion-win-1:
----------
ID: get_xml
Function: file.managed
Name: C:\task.xml
Result: True
Comment: File C:\task.xml updated
Started: 17:39:38.661603
Duration: 140.625 ms
Changes:
----------
diff:
Replace text file with binary file
As soon encoding of the file is changed to utf-8, the file is replaced by text file and jinja values are replaced.
Description of Issue
When trying to run
file.replace
on a UCS-2 encoded file on Windows, it errors out with the following message:Instead of doing this, file.replace should support specifying the correct encoding for the file, for example:
The following Python(3) example does this (and even automatically writes a BOM):
Versions Report