Closed davidism closed 1 year ago
couple things I noticed while looking into this
Bellow are latest werkzeug test suite run logs with pytest-xprocess 0.20.0
xprocess==0.20.0,setuptools==65.6.3,tomli==2.0.1,watchdog==2.1.9,Werkzeug @ > file:///D:/a/werkzeug/werkzeug/.tox/.tmp/package/1/Werkzeug-2.3.0.dev0.tar.gz,wheel==0.38.4 py310: commands[0]> pytest -v --tb=short --basetemp=D:\a\werkzeug\werkzeug.tox\py310\tmp ============================= test session starts ============================= ...
INTERNALERROR> File "D:\a\werkzeug\werkzeug.tox\py310\lib\site-packages\xprocess\pytest_xprocess.py", line 55, in pytest_runtest_makereport INTERNALERROR> content = logfiles[name].read() INTERNALERROR> File "C:\hostedtoolcache\windows\Python\3.10.9\x64\lib\encodings\cp1252.py", line 23, in decode INTERNALERROR> return codecs.charmap_decode(input,self.errors,decoding_table)[0] INTERNALERROR> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 170: character maps to
this shows xprocess breaking internally, as you pointed out. And bellow are same logs but running with new xprocess changes (soon to be released) where we now only work with binary file streams and try/catch all decode errors, skipping such lines.
5.9.2,py==1.11.0,pycparser==2.21,pyparsing==3.0.9,pytest==7.1.3,pytest-timeout==2.1.0,pytest-xprocess==0.22.4,setuptools==65.6.3,tomli==2.0.1,watchdog==2.1.9,Werkzeug @ file:///D:/a/werkzeug/werkzeug/.tox/.tmp/package/1/Werkzeug-2.3.0.dev0.tar.gz,wheel==0.38.4 py310: commands[0]> pytest -v --tb=short --basetemp=D:\a\werkzeug\werkzeug.tox\py310\tmp ============================= test session starts ============================= platform win32 -- Python 3.10.9, pytest-7.1.3, pluggy-1.0.0 -- D:\a\werkzeug\werkzeug.tox\py310\Scripts\python.EXE cachedir: .tox\py310.pytest_cache rootdir: D:\a\werkzeug\werkzeug, configfile: setup.cfg, testpaths: tests plugins: timeout-2.1.0, xprocess-0.22.4
_ test_wrongprotocol tests\test_serving.py:149: in test_wrong_protocol assert "Traceback" not in standard_app.log.read() C:\hostedtoolcache\windows\Python\3.10.9\x64\lib\encodings\cp1252.py:23: in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] E UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 162: character maps to
---------------------------- Captured stdout setup ---------------------------- D:\a\werkzeug\werkzeug.tox\py310.pytest_cache\d.xprocess\dev_server-test_wrong_protocol$ D:\a\werkzeug\werkzeug.tox\py310\Scripts\python.EXE D:\a\werkzeug\werkzeug\tests\live_apps\run.py standard {"port": 60471} process 'dev_server-test_wrong_protocol' started pid=6288 WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
I checked the breaking test, and following part caught my attention:
https://github.com/pallets/werkzeug/blob/main/tests/conftest.py#L44
def tail_log(self, path):
self.log = open(path)
self.log.read()
while xprocess will no longer break internally, the process log files will still contain byte sequences which will break when the file stream yielded by xprocess is opened in non-binary mode
Werkzeug dev server log file:
@@xproc_block_delimiter@@ [31m[1mWARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.[0m
- Running on http://127.0.0.1:64306 [33mPress CTRL+C to quit[0m 127.0.0.1 - - [01/Jan/2023 14:31:02] "GET /ensure HTTP/1.1" 200 - 127.0.0.1 - - [01/Jan/2023 14:31:02] code 400, message Bad HTTP/0.9 request type ('\x16\x03\x01\x00¯\x01\x00\x00«\x03\x03ß\x8fô§÷þC') 127.0.0.1 - - [01/Jan/2023 14:31:02] "[35m[1m�¯��«ßô§÷þC ÕLlc½?ê<[T׫¾óWü'G��L̨̩̪À0À,À(À$ÀÀ[0m" HTTPStatus.BAD_REQUEST -
Surrogate escape turns invalid bytes into surrogate code points (aka no data is lost)
This is way better than just dropping lines
Yep, surrogates would also make for a more clean implementation so I'll probably rewrite that part before the release. Thanks for the tip. But to be clear (haven't expressed what I meant well), the lines with garbage value would not be dropped, just skipped during pattern matching, which I assumed would be OK since users shouldn't be attempting to match patterns containing invalid bytes anyway (I hope not :bowtie: )
Fixed by https://github.com/pytest-dev/pytest-xprocess/pull/127, and verified in Werkzeug.
The
stdout
file passed toPopen
inxprocess.ensure
is opened in binary mode (a+b
). However, the file is also opened in text read mode with the system encoding forxresource.fhandle
, which is then read bypytest_runtest_makereport
. If the process writes junk binary data to the stream, then reading the data can result inUnicodeDecodeError
.https://github.com/pytest-dev/pytest-xprocess/blob/2dd5b9fcfadf138e199bae12f310f272c5be059d/xprocess/xprocess.py#L235
https://github.com/pytest-dev/pytest-xprocess/blob/2dd5b9fcfadf138e199bae12f310f272c5be059d/xprocess/xprocess.py#L271
https://github.com/pytest-dev/pytest-xprocess/blob/5b59835c57b7cc6e0b72e6a851751dd408e21967/xprocess/pytest_xprocess.py#L70
Just upgraded to 0.21 from 0.20 for Werkzeug and started getting this issue. Here's an example of the error: https://github.com/pallets/werkzeug/actions/runs/3803459746/jobs/6469880268#step:7:595
The file should probably be opened with
errors="replace"
, or some other check should be used to avoid bytes/text issues.