pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.32k stars 636 forks source link

Reduce Pants' 3rdparty dependencies #19282

Open thejcannon opened 1 year ago

thejcannon commented 1 year ago

Is your feature request related to a problem? Please describe. In an ideal world, Pants-the-wheel has no 3rdparty reliance. Everything is either provided through static Rust code in the engine, or we use Pex to download and install tools (like we do today).

This would reduce Pants' installation footprint, increase security, and would be a slight bump in first-time installation of Pants.

Describe the solution you'd like The pantsbuild wheel requires 0 3rdparty deps.

Describe alternatives you've considered N/A

Additional context :taco:

benjyw commented 1 year ago

I'm not sure I agree that this is a desirable goal. Like any other codebase, we commonly use Python libraries. These are not tools we invoke via a subprocess, so I don't see the analogy to those.

If the purpose is to avoid a pip resolve at install time, this is what scie is for. In a pants-scie world, we would deploy a binary that contains all the 3rdparty deps already baked into the file.

benjyw commented 1 year ago

I think the energy should be going in to pants-as-a-scie. That gives us, I think, the wins this issue is striving for, as well as cutting pypi out of our deploy entirely, in favor of github releases. One download and done!

thejcannon commented 1 year ago

I don't think the scie solution solves the size or security concerns, and surely doesn't help with the issue of conflicting dependencies (like we recently got hurt by the requests library colliding with in-repo plugins).

Consider the reason we don't add new dependencies today (at the top of requirements.txt). Would we have added our current deps if they weren't already there?

Also imagine if we didn't need to build Pants as a PEX for scie to work. The wheel was all you needed...

benjyw commented 1 year ago

Sure, it is better not to need external deps in a vacuum. But we use them! The alternatives I see are to replicate functionality in our codebase that we could be using from someone else's, which doesn't seem like a great tradeoff except in trivial cases. Or to remove functionality. Or am I missing a third option here? I suppose more binding to Rust functionality (e.g., to replace requests)? That could make sense, and improve performance.

The third-party plugin issue is indeed thorny. Reducing our own deps will mitigate that, and is a good reason to pursue this (why not mentioned above though...) We can't fully solve this though - if you consume two third-party plugins (say not provided by us) their requirements can collide! But I agree that we are the overwhelmingly dominant provider of plugins for now, so our own footprint is the main problem.

And is size actually an issue? Some numbers would be instructive.

thejcannon commented 1 year ago

Yes, another option is use rust facilities, which has its own tradeoffs, but more stomachable.

And the last is Pex. Some of our dependencies could be provided by shifting to a PEX process.

benjyw commented 1 year ago

And the last is Pex. Some of our dependencies could be provided by shifting to a PEX process.

For example? These would have to be cases where process invocation overhead is tolerable. I guess network requests might fall under that category.

kaos commented 1 year ago

I don't think 0 dependencies is a reachable goal, but agree with keeping the number of deps low (as low as sensible) and periodically actively reviewing which deps we have is probably a good idea.

thejcannon commented 1 year ago

OK, here's my homework:

Scraped from: `pants paths --from=src/python/pants:pants-packaged --to=3rdparty/python/requirements.txt grep # sort uniq` Name Purpose Backends Removal Strategy Difficulty/Risk
ansicolors Terminal Colorer (core) just in-source Low/Low
chevron templating Go string.Template or f-strings Low/Low
fasteners locks (core) in-source maybe? ???/Medium
ijson json parser Go json stdlib, otherwise Rust-based (Depends?)/Low
importlib-resources resource-loading (core) stdlibrary (yay Py3.9!) Low/Low
node-semver version comparison JS ??? ???
packaging version comparison (core), Python ??? ???/High
pex (We literally use it just to get PEX_PYTHON_PATH from RC files) Python In-Source or use a process Low/Low
psutil Process utilities (core) Move to Rust ???/High
python-lsp-jsonrpc LSP Support (BSP) ??? ???/???
PyYAML YAML loading/dumping Helm, JS, OpenAI, (core) Move to Rust Medium/Low
setproctitle Set/Get Process title (core) In-Source or Move to Rust Low/Low
setuptools Resource loading, Requirement Parsing Java, JS, Python, (core), JVM Use importlib for resources. Reqs use ??? ???/High
toml TOML support Python, BSP, JVM, (core) Move to Rust Medium/Low
types-PyYAML typing N/A Just exclude... Low/Low
types-setuptools typing N/A Just exclude... Low/Low
types-toml typing N/A Just exclude... Low/Low
typing-extensions Future typing shenanigans Docker, Helm, JS, (core), Python Case-by-case, newer Python helps a bunch Medium/Low

There's some that I couldn't guess the strategy or difficulty off-the-bat.

So some things I see:


So, that puts us in a place where I think if we figure out the more hairy ones, we really could have 0 3rdparty reqs.

I'd also argue it's probably worth picking the low-hanging fruit so the list is as small as possible :smile: I'm happy to do that myself. Hell, I'm happy to do all of this work.

thejcannon commented 1 year ago

(FWIW importlib_resources can and should be removed: https://github.com/pantsbuild/pants/pull/19339)