webrecorder / pywb

Core Python Web Archiving Toolkit for replay and recording of web archives
https://pypi.python.org/pypi/pywb
GNU General Public License v3.0
1.34k stars 207 forks source link

Upgrade dependencies #839

Closed edsu closed 3 months ago

edsu commented 1 year ago

This is a draft since 4 tests are failing after the upgrades...

Description

Upgrade various dependencies (gevent, werkzeug, markupsafe, jinja2, flask and httpbin) with the goal of fixing builds with Python v3.11 and higher.

Motivation and Context

I spent a bit of time trying to let gevent float to 3.1.2 and working through the effects of running pip install -r requirements.txt.

The good news is gevent v22.10.2 installs under Python3.11 now. But installing the latest werkzeug (which is currently unpinned) brings along the need to upgrade markupsafe and jinja2. I worked through the minimal changes to get that working, which mostly involved using pass_context instead of contextfunction.

The bad news is that the tests installed an old version of flask, which pulled in an old version of werkzeug which breaks things again. The reason why flask < v2 is installed is to get the old werkzeug which is needed by httpbin, which appears to be dead?

I think we could pin back werkzeug in our requirements.txt to solve this. But it seems like we would just be forestalling the inevitable need to upgrade? The approach taken here is to install a fork of httpbin for testing which has been updated to work with the latest werkzeug.

Fixes #835

Types of changes

Checklist:

FraMecca commented 1 year ago

Hi, thank you for this draft MR.

I am trying to get this project into the gentoo user repository so it would be very valuable to have python 3.10+ support.

For this reason I tried to add two small modifications in order to fix the tests:

FraMecca commented 1 year ago

Overall there are only 5 tests failing on my branch but they are all related to path rewriting. I need help fixing those because I have no experience with that part of the codebase

platform linux -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0
rootdir: /home/user/pywb, configfile: tox.ini, testpaths: tests
plugins: cov-4.1.0
collected 512 items                                                                                                                                                          

tests/test_acl.py ..........                                                                                                                                           [  1%]
tests/test_acl_manager.py .................                                                                                                                            [  5%]
tests/test_auto_colls.py ..........................................                                                                                                    [ 13%]
tests/test_cdx_server_app.py .................                                                                                                                         [ 16%]
tests/test_cert_req.py ..                                                                                                                                              [ 17%]
tests/test_cli.py .........                                                                                                                                            [ 18%]
tests/test_embargo.py ......                                                                                                                                           [ 20%]
tests/test_force_https.py ........                                                                                                                                     [ 21%]
tests/test_integration.py ....................................................................................................                                         [ 41%]
tests/test_live_rewriter.py ..........FFFF.............F......                                                                                                         [ 47%]
tests/test_locales.py ssss                                                                                                                                             [ 48%]
tests/test_manager.py .......                                                                                                                                          [ 50%]
tests/test_memento.py ..............................                                                                                                                   [ 55%]
tests/test_prefer_header.py ...................................................................................                                                        [ 72%]
tests/test_prefixed_deploy.py ......                                                                                                                                   [ 73%]
tests/test_proxy.py ssssssssssssssssssssssssssssssssssssssssssssss                                                                                                     [ 82%]
tests/test_range.py ............                                                                                                                                       [ 84%]
tests/test_record_dedup.py .....                                                                                                                                       [ 85%]
tests/test_record_replay.py ........................                                                                                                                   [ 90%]
tests/test_redirect_classic.py ............                                                                                                                            [ 92%]
tests/test_redirect_revisits.py .......                                                                                                                                [ 93%]
tests/test_redirects.py ...............                                                                                                                                [ 96%]
tests/test_root_coll.py ........                                                                                                                                       [ 98%]
tests/test_socks.py ....                                                                                                                                               [ 99%]
tests/test_zipnum_auto_dir.py ....     
edsu commented 1 year ago

@FraMecca I see 4 tests failing in my Python 3.11 dev environment:

FAILED tests/test_live_rewriter.py::TestLiveRewriter::test_live_bad_content_length[frame] - webtest.app.AppError: Bad response: 400 Bad Request (not 200)
FAILED tests/test_live_rewriter.py::TestLiveRewriter::test_live_bad_content_length[non-frame] - webtest.app.AppError: Bad response: 400 Bad Request (not 200)
FAILED tests/test_live_rewriter.py::TestLiveRewriter::test_live_bad_content_length_with_range[frame] - webtest.app.AppError: Bad response: 400 Bad Request (not 206)
FAILED tests/test_live_rewriter.py::TestLiveRewriter::test_live_bad_content_length_with_range[non-frame] - webtest.app.AppError: Bad response: 400 Bad Request (not 206)

I'm pretty sure these are related to me switching to a fork of httpbin. I need to get these tests problems resolved now since I'm having trouble installing greenlet v1 on the latest debian Docker images, which ship with Python 3.11.

FraMecca commented 1 year ago

@edsu so we have the same failures on our branches. I don't understand how you are able to pull that without switching to a py3 version of pyamf. What do you plan to do for httpbin?

edsu commented 1 year ago

@FraMecca I'm currently using a fork of httpbin that has been updated: https://github.com/webrecorder/pywb/blob/upgrade-dependencies/test_requirements.txt#L9

I updated this PR to replace werkzeug.wsgi import pop_path_info with wsgiref.util import shift_path_info since the werkzeug team decided it was duplicated there.

It looks like py3AMF is being installed conditionally in the setup.py. Is that not working for you?

edsu commented 1 year ago

I think that perhaps these length tests are failing because spaces are now being removed from JSON responses somewhere in the stack, maybe by the httpbin fork?

Here are a bunch of space related changes I needed to make to get some tests to pass: https://github.com/webrecorder/pywb/commit/5b72d968717db159cc37a56ee430620f1dcfa161

FraMecca commented 1 year ago

@edsu I get 400 instead of 200 on some tests that use httbin, so it might be a different issue

FraMecca commented 1 year ago

It looks like py3AMF is being installed conditionally in the setup.py. Is that not working for you?

It's needed, for the tests, right? In any case porting the library to py3 wasn't a big effort so I would be glad if it can be of help to you

edsu commented 9 months ago

@FraMecca there is already a Python3 port of pyamf at https://github.com/StdCarrot/Py3AMF. Does that not work for you?

edsu commented 9 months ago

It looks like https://github.com/psf/httpbin is being actively maintained by the Python Software Foundation so I'm going to switch to using that.

Sadly, I see it still has the same set of HTTP 400 instead of HTTP 200 errors:

FAILED tests/test_live_rewriter.py::TestLiveRewriter::test_live_bad_content_length[frame] - webtest.app.AppError: Bad response: 400 Bad Request (not 200)

But at least if a change in httbin is needed it is feasible to get it applied. The original repository is abandoned.

edsu commented 9 months ago

Sadly it appears that httpbin is no longer able to return invalid Content-Length headers:

curl -i -X GET "http://localhost:8080/response-headers?content-length=149,149" -H "accept: application/json"
HTTP/1.1 500 Internal Server Error
Connection: close
Content-Type: text/html
Content-Length: 141

<html>
  <head>
    <title>Internal Server Error</title>
  </head>
  <body>
    <h1><p>Internal Server Error</p></h1>

  </body>
</html

This is what appears in the docker console when run against the psf/httpbin:

[2023-09-22 10:21:01 +0000] [7] [ERROR] Error handling request /response-headers?content-length=149,149
Traceback (most recent call last):
  File "/opt/httpbin/lib/python3.10/site-packages/gunicorn/workers/base_async.py", line 55, in handle
    self.handle_request(listener_name, req, client, addr)
  File "/opt/httpbin/lib/python3.10/site-packages/gunicorn/workers/ggevent.py", line 128, in handle_request
    super().handle_request(listener_name, req, sock, addr)
  File "/opt/httpbin/lib/python3.10/site-packages/gunicorn/workers/base_async.py", line 108, in handle_request
    respiter = self.wsgi(environ, resp.start_response)
  File "/opt/httpbin/lib/python3.10/site-packages/flask/app.py", line 2213, in __call__
    return self.wsgi_app(environ, start_response)
  File "/opt/httpbin/lib/python3.10/site-packages/flask/app.py", line 2197, in wsgi_app
    return response(environ, start_response)
  File "/opt/httpbin/lib/python3.10/site-packages/werkzeug/wrappers/response.py", line 579, in __call__
    start_response(status, headers)
  File "/opt/httpbin/lib/python3.10/site-packages/gunicorn/http/wsgi.py", line 243, in start_response
    self.process_headers(headers)
  File "/opt/httpbin/lib/python3.10/site-packages/gunicorn/http/wsgi.py", line 264, in process_headers
    self.response_length = int(value)
ValueError: invalid literal for int() with base 10: '149,149'

Maybe the older version of httpbin wasn't so strict?

aquatix commented 4 months ago

Any news on this? We would like to upgrade our old Debian 11 servers to 12 :)

tw4l commented 3 months ago

Any news on this? We would like to upgrade our old Debian 11 servers to 12 :)

Hi, thanks for your patience and to @edsu for starting this branch! Apologies for the long delay on taking this up, we've been very focused on Browsertrix of late and are a small team. I'm planning to merge this into main soon with some other fixes/features and we should put out a new release shortly!

tw4l commented 3 months ago

Pinning gevent==23.9.0 and greenlet>=2.0.2 to specific versions because httpbin requires greenlet < 3 for now. We can update to latest/greenlet 3 when the PSF-maintained httpbin updates, or contribute that separately.

tw4l commented 3 months ago

Note that werkzeug 2.3.0 dropped support for Python 3.7; if we want to continue to support Python < 3.8, we'd need to pin to werkzeug==2.2.3, which does not support Python 3.12 so may become an issue shortly.

tw4l commented 3 months ago

It's also worth noting that werkzeug dropped support for Python 2 and Python 3.5 in version 2.0.0, released in 2021, and we previously didn't have the version pinned so likely our claim to support Python 2 and <3.6 hasn't actually been true for a little while. There are almost certainly some additional things that can be dropped accordingly, but I didn't want the scope of this PR to expand too much so that might be better done in a follow-up.

tw4l commented 3 months ago

We decided to support Python 3.7-3.11 in the next release, and will drop Python 3.7, add Python 3.12, and upgrade dependencies further in the following release. I will make an issue to document what will be needed at that point.

aquatix commented 3 months ago

Great work @tw4l !