web-platform-tests / rfcs

web-platform-tests RFCs
75 stars 63 forks source link

Requirements for Python dependency management #82

Open foolip opened 3 years ago

foolip commented 3 years ago

Hi @web-platform-tests/wpt-core-team!

Recently I have taken a closer look at how we track and install Python dependencies here in WPT. I wrote up two fairly detailed issues about what we're currently doing in https://github.com/web-platform-tests/wpt/issues/28801 and https://github.com/web-platform-tests/wpt/issues/28809.

I suspect that there's room to improve matters here, but want to gather requirements before I suggest something. But first, some problem I think are worth trying to solve:

Here are the requirements I would consider for any change here, roughly in order of importance:

I'm interested to hear if others have had trouble with dependencies, and if there are additional constraints/requirements here.

LukeZielinski commented 3 years ago

Here's an overview of how wptrunner deps work on the Chromium side:

Chromium's use of wptrunner has been broken by new dependencies in the past, but it happens infrequently. We are almost always "behind" on versions. That is, we don't react to every single minor version update of deps. Instead, we roll up to the latest version of a package when something breaks (we get a chance to intervene without breaking the CI because we roll tools manually). Getting up to date is usually an easy process, although it can be slow due to the manual work involved to get a new package onto the package server. Transitive dependencies have been an issue (ie: you install one package but then realize you need a bunch more) -- this happened only once, though, as part of the py3 upgrade (which was a major change).

foolip commented 3 years ago

Thanks @LukeZielinski, that's super helpful!

Given "our own package server ... currently a manual process" would it be fair to say that it'd be an improvement if dependencies in WPT were updated rarely, on a very predictable schedule like quarterly? Right now we're landing pyup changes as they come in.

And am I right to assume that moving to only installed dependencies would be better, to not have to worry about vendored+installed compatibility and which takes precedence? (It seems to me that in WPT, the vendored deps will take precedence. Oops?)

foolip commented 3 years ago

https://github.com/web-platform-tests/wpt/issues/24829 has some good discussion too.

jgraham commented 3 years ago

On the gecko side:

Basically for Gecko, at this time, moving toward more installed deps would be A Bad Thing, and in general we don't have loads of problems with the current setup where actually running tests can be done using entirely in-tree deps (at least when the tree is mozilla-central).

LukeZielinski commented 3 years ago

Thanks @LukeZielinski, that's super helpful!

Given "our own package server ... currently a manual process" would it be fair to say that it'd be an improvement if dependencies in WPT were updated rarely, on a very predictable schedule like quarterly? Right now we're landing pyup changes as they come in.

I'd consider this an improvement yes. Mostly in terms of planning, so you could dedicate some time each quarter (or whatever) to get your deps figured out, and not have to scramble.

And am I right to assume that moving to only installed dependencies would be better, to not have to worry about vendored+installed compatibility and which takes precedence? (It seems to me that in WPT, the vendored deps will take precedence. Oops?)

All-vendored or all-installed both seem better than the hybrid thing that exists today. One argument I could see against all-vendored deps is that it checks a whole pile of random code into the tree (pollutes codesearch results for instance). Although we've never tried to go down this path, so there could be technical challenges I'm not aware of.

foolip commented 3 years ago

Thanks @jgraham, that's very helpful!

In case you saw it, I want to be clear that the all-installed https://github.com/web-platform-tests/wpt/pull/28796 isn't really the direction I'm hoping for here, that's was an exploration to figure out what the dependencies actually are.

It sounds like a requirements from Gecko is that it should be easy import WPT and all of its dependencies so that everything needed is in-tree, and no network access is needed while running WPT in Gecko. Having precisely the dependencies that Gecko doesn't already have manually vendored into WPT meets that requirement.

A rough sketch of the direction I am hoping for now:

jgraham commented 3 years ago

So one reasonable goal here would be to remove localpaths.py. I wonder if something like the following could work:

That would have the following properties:

foolip commented 3 years ago

Getting rid of localpaths.py is indeed an appealing part of this.

It would be possible for users to use either the vendored dep or the in-tree one at their preference, knowing that both are the same version

Yes!

this does change if we have local patches; idk if that's a current problem

I looked at the recent commits of everything in https://github.com/web-platform-tests/wpt/issues/28801 and didn't find anything with local modifications. Only pywebsocket3 is problematic because it's not yet available on pypi, see https://github.com/GoogleChromeLabs/pywebsocket3/issues/30.

jgraham commented 3 years ago

One concern here is startup latency; if every wpt run call is invoking pip and checking that all the packages are installed it's going to add noticable startup time.

gsnedders commented 3 years ago

For WebKit, there's a few notable cases:

  1. Running LayoutTests, which in the future should require wpt.manifest
  2. Running check-style, which in the future should require wpt.lint
  3. Running WebDriverTests, which requires wptrunner etc. (though I think safaridriver might not use wptrunner, merely using pytest directly?)

Historically, the goal was for at least ./wpt manifest and ./wpt lint to be runnable without any external dependencies (though I'm not sure how much value there is in the former?), but we seem to have gone beyond that (e.g., with hyper and h2). One of the problems with the status-quo is I think it's unclear why things are vendored v. put in as a requirement?

At least from my point-of-view, I'd rather we avoid the status-quo where we have our tools adding our own versions of the packages to sys.path, rather than how other tools (e.g., pip) do vendoring whereby they end up with imports like from pip._vendored import html5lib. That is probably the most essential thing to do.

To note, I think pytest might be the only thing we vendor which vaguely regularly has any API breakage? Maybe h2 too?

All-vendored or all-installed both seem better than the hybrid thing that exists today. One argument I could see against all-vendored deps is that it checks a whole pile of random code into the tree (pollutes codesearch results for instance). Although we've never tried to go down this path, so there could be technical challenges I'm not aware of.

One problem is that some of the packages that wptrunner requires depend on compiled extensions (such as Pillow). We need to install them somewhere as a result.

foolip commented 3 years ago

One concern here is startup latency; if every wpt run call is invoking pip and checking that all the packages are installed it's going to add noticable startup time.

That's precisely one of the things I'd like to resolve here. When I run ./wpt run myself, I can tell that pip is being called at least twice based on the deprecation messages. This doesn't happen in vendor CI, of course, but I'd like to get a better faster experience in the CLI too :)