pytest-dev / pytest-xprocess

pytest external process plugin
MIT License
98 stars 20 forks source link

report can fail to read log file as text #120

Closed davidism closed 1 year ago

davidism commented 1 year ago

The stdout file passed to Popen in xprocess.ensure is opened in binary mode (a+b). However, the file is also opened in text read mode with the system encoding for xresource.fhandle, which is then read by pytest_runtest_makereport. If the process writes junk binary data to the stream, then reading the data can result in UnicodeDecodeError.

https://github.com/pytest-dev/pytest-xprocess/blob/2dd5b9fcfadf138e199bae12f310f272c5be059d/xprocess/xprocess.py#L235

https://github.com/pytest-dev/pytest-xprocess/blob/2dd5b9fcfadf138e199bae12f310f272c5be059d/xprocess/xprocess.py#L271

https://github.com/pytest-dev/pytest-xprocess/blob/5b59835c57b7cc6e0b72e6a851751dd408e21967/xprocess/pytest_xprocess.py#L70

Just upgraded to 0.21 from 0.20 for Werkzeug and started getting this issue. Here's an example of the error: https://github.com/pallets/werkzeug/actions/runs/3803459746/jobs/6469880268#step:7:595

The file should probably be opened with errors="replace", or some other check should be used to avoid bytes/text issues.

northernSage commented 1 year ago

couple things I noticed while looking into this

Bellow are latest werkzeug test suite run logs with pytest-xprocess 0.20.0

xprocess==0.20.0,setuptools==65.6.3,tomli==2.0.1,watchdog==2.1.9,Werkzeug @ > file:///D:/a/werkzeug/werkzeug/.tox/.tmp/package/1/Werkzeug-2.3.0.dev0.tar.gz,wheel==0.38.4 py310: commands[0]> pytest -v --tb=short --basetemp=D:\a\werkzeug\werkzeug.tox\py310\tmp ============================= test session starts ============================= ...

INTERNALERROR> File "D:\a\werkzeug\werkzeug.tox\py310\lib\site-packages\xprocess\pytest_xprocess.py", line 55, in pytest_runtest_makereport INTERNALERROR> content = logfiles[name].read() INTERNALERROR> File "C:\hostedtoolcache\windows\Python\3.10.9\x64\lib\encodings\cp1252.py", line 23, in decode INTERNALERROR> return codecs.charmap_decode(input,self.errors,decoding_table)[0] INTERNALERROR> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 170: character maps to

this shows xprocess breaking internally, as you pointed out. And bellow are same logs but running with new xprocess changes (soon to be released) where we now only work with binary file streams and try/catch all decode errors, skipping such lines.

5.9.2,py==1.11.0,pycparser==2.21,pyparsing==3.0.9,pytest==7.1.3,pytest-timeout==2.1.0,pytest-xprocess==0.22.4,setuptools==65.6.3,tomli==2.0.1,watchdog==2.1.9,Werkzeug @ file:///D:/a/werkzeug/werkzeug/.tox/.tmp/package/1/Werkzeug-2.3.0.dev0.tar.gz,wheel==0.38.4 py310: commands[0]> pytest -v --tb=short --basetemp=D:\a\werkzeug\werkzeug.tox\py310\tmp ============================= test session starts ============================= platform win32 -- Python 3.10.9, pytest-7.1.3, pluggy-1.0.0 -- D:\a\werkzeug\werkzeug.tox\py310\Scripts\python.EXE cachedir: .tox\py310.pytest_cache rootdir: D:\a\werkzeug\werkzeug, configfile: setup.cfg, testpaths: tests plugins: timeout-2.1.0, xprocess-0.22.4

_ test_wrongprotocol tests\test_serving.py:149: in test_wrong_protocol assert "Traceback" not in standard_app.log.read() C:\hostedtoolcache\windows\Python\3.10.9\x64\lib\encodings\cp1252.py:23: in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] E UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 162: character maps to ---------------------------- Captured stdout setup ---------------------------- D:\a\werkzeug\werkzeug.tox\py310.pytest_cache\d.xprocess\dev_server-test_wrong_protocol$ D:\a\werkzeug\werkzeug.tox\py310\Scripts\python.EXE D:\a\werkzeug\werkzeug\tests\live_apps\run.py standard {"port": 60471} process 'dev_server-test_wrong_protocol' started pid=6288 WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.

I checked the breaking test, and following part caught my attention:

https://github.com/pallets/werkzeug/blob/main/tests/conftest.py#L44

def tail_log(self, path):
    self.log = open(path)
    self.log.read()

while xprocess will no longer break internally, the process log files will still contain byte sequences which will break when the file stream yielded by xprocess is opened in non-binary mode

Werkzeug dev server log file:

@@xproc_block_delimiter@@ WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.

  • Running on http://127.0.0.1:64306 Press CTRL+C to quit 127.0.0.1 - - [01/Jan/2023 14:31:02] "GET /ensure HTTP/1.1" 200 - 127.0.0.1 - - [01/Jan/2023 14:31:02] code 400, message Bad HTTP/0.9 request type ('\x16\x03\x01\x00¯\x01\x00\x00«\x03\x03ß\x8fô§÷þC') 127.0.0.1 - - [01/Jan/2023 14:31:02] "�¯��«ߏô§÷þC ÕLlc½?ê<[Tו«¾óW’ü'G��L̨̩̪À0À,À(À$ÀÀ" HTTPStatus.BAD_REQUEST -
RonnyPfannschmidt commented 1 year ago

Surrogate escape turns invalid bytes into surrogate code points (aka no data is lost)

This is way better than just dropping lines

northernSage commented 1 year ago

Yep, surrogates would also make for a more clean implementation so I'll probably rewrite that part before the release. Thanks for the tip. But to be clear (haven't expressed what I meant well), the lines with garbage value would not be dropped, just skipped during pattern matching, which I assumed would be OK since users shouldn't be attempting to match patterns containing invalid bytes anyway (I hope not :bowtie: )

davidism commented 1 year ago

Fixed by https://github.com/pytest-dev/pytest-xprocess/pull/127, and verified in Werkzeug.