Publishing a package is hard. New command 'pip publish'?

pypa / packaging-problems

An issue tracker for the problems in packaging

150 stars 35 forks source link

Publishing a package is hard. New command 'pip publish'? #60

Open hickford opened 9 years ago

hickford commented 9 years ago

Even after you've written a setup.py, publishing a package to PyPI is hard. Certainly I found it confusing the first time.

The sequence of steps is complex and made stressful by all the decisions left to the user:

Register on PyPI (on the website or with setup.py register?)
Login to PyPI (with setup.py register or by writing a .pypirc?)
Build distributions (source? egg? windows installers? wheel?)
Upload distributions (with setup.py upload or with twine?)

It would be neat to have a single command pip publish analogous to npm publish that did all of this, correctly.

It would build whichever distributions are deemed fashionable (source + wheel). It you weren't logged in it would automatically run the wizard pip register.

sirosen commented 5 years ago

Mostly agree with @takluyver, but some notes:

Merging twine into pip doesn't add a new and distinct support burden to the PyPA team, because they're already maintaining pip and twine. Consolidation could even potentially simplify things from the maintainer side.

This thread has devolved into a discussion of these workflow tools, which is somewhat dismaying. Fighting against tool bloat doesn't mean that I want to ask PyPA to maintain a poetry/flit/whatever competitor. I'd adore pip publish which builds distributions and does the twine step. But citing that as "too much" is not a reason not to combine pip and twine.

All of the objections to merging twine into pip seem to be based on principles about what pip should and should not do. But combining the tools is an eminently tractable step which would simplify people's lives.

If these waters are too muddy, I may open a new and distinct issue to suggest exactly that: combine pip and twine, name TBD as pip upload, pip twine, pip idontcare, etc.

dstufft commented 5 years ago

I'd point out that "The PyPA team" isn't really a distinct set of people, each project under the PyPA generally operates more or less independently except when we need to agree on some standard for interoperability. In this case, the folks maintaining pip are more or less wholly distinct from the folks maintaining twine.

That being said, as the original author of twine, it was generally my intention that once it baked a bit, we would add it into pip. That was my opinion and I don't know how the other pip maintainers feel about it.

One possible upside of keeping the two tools separate is that you can provide more focused defaults or tooling for specific tasks where the consumer and the producer may have similar tasks, but a desire for different defaults.

For an example, pip wheel will produce a wheel for a given target (directory, package name, whatever) as well as for all of it's dependencies, with the ultimate goal that you end up with a directory full of wheels for the entire dependency chain. Now twine doesn't have a twine wheel command, but you could imagine it gaining one that was intended to function similarly to python setup.py bdist_wheel, just utilizing the PEP 517 hooks. In that case you probably wouldn't want to build the wheels for the entire dependency chain, just for the current project. Having separate tools makes it possible to optimize the workflows, whereas with a singular tool you have to juggle between competing desires more and can end up making a worse workflow for both groups of people.

That being said, obviously having two tools instead of one tool is also a specific trade off that says that pay extra complexity for authors (having to install another tool and use it) is worth it to remove some complexity from end users (removing commands they aren't likely going to require) and possibly shift some complexity around for authors (if they want to use pip wheel instead of a hypthoetical twine wheel they might have to disable consumer oriented features of pip wheel).

Of course there are ways of dealing with all of those things too. You could move all of the "producer" commands under a separate command namespace, like pip twine upload, pip twine wheel. Another option is to just mash the two namespaces together and require flags to disable certain features for either producers or consumers. There are possibly other solutions to this fundamental idea that producers and consumers might want different things from their tools, and how we rectify that. Trying to move forward is likely going to require someone coming up with a full proposal and getting everyone agreed on it, and then making a PR that does that agreed upon behavior.

sirosen commented 5 years ago

In this case, the folks maintaining pip are more or less wholly distinct from the folks maintaining twine.

But I thought all of PyPA was just maintained by @dstufft ???

Kidding aside, thanks for pointing this out. As is perhaps obvious, I thought that they were the same people.

[the argument that] extra complexity for authors (having to install another tool and use it) is worth it to remove some complexity from end users (removing commands they aren't likely going to require)

I would find this much more compelling if pip didn't already carry so much cruft? I'm sure pip wheel is essential to someone's workflow, but I also don't have much doubt that removing it to a standalone tool would be worth it.

There are two additional things that occur to me:

The core python team now includes and endorses the use of pip. This matters. Adding things to pip, even only in new versions which aren't shipped with python distributions, makes them much more "obviously correct" for novices and junior developers.
Even if the twine command remains a distinct command to invoke, we can talk about including it in the pip package.

pfmoore commented 5 years ago

One entirely practical issue with adding new commands to pip is that we're very stretched for resources on the project. We've got a lot of outstanding work to do rationalising and improving the existing codebase, and it's hard even getting that done. Something like merging in twine would be a lot of extra work, for relatively little benefit.

Having said that, I don't object to the idea of pip becoming more of a "one stop" solution for the overall package production and consumption cycle. It's mostly just a matter of prioritising how we spend our limited resources.

gaborbernat commented 5 years ago

Would we get more contributors/maintainers to pip this is definitely something that could be done. In practice with the current bandwidth of maintainers (who do all work mosly part of their free time) there's little chance of this in foreseeable future. We need to remember that Python as a whole very rarely gets resources to someone work full time on projects and as such we are constrained on what we can/should take on. Again it's mostly about benefit, risk and reward triangle optimization.

Pomax commented 5 years ago

I would question the "for very little benefit" part, though. Right now, anyone coming to Python is taught to use Python and pip, and then they hit a brick wall when they actually want to publish some code (no matter how small and ninche-useful it is =). So at the very least, if the time and effort required to get pip publish to do what twine does right now is too much, then can we add "just parsing" for the command pip publish and have it print some instructions on what people should be doing instead?

E.g.:

$ pip publish

pip is python's dedicated install manager, and does not contain any code to help
facilitate the publication of packages. That task is handled by twine, which can be
installed using pip through "pip install twine".

$

And then bonus points for not leaving the user hanging:

$ pip publish

pip is python's dedicated install manager, and does not contain any code to help
facilitate the publication of packages. That task is handled by twine, which can be
installed using pip through "pip install twine".

Would you like to install twine? [Y/n] _

(Even more bonus points for "[...] does not contain any code to help facilitate the publication of packages at this time" of course =)

And sure, that means pip now advocates twine as tool of choice but something has to fill that role. No one is going to learn about twine, or wheel, or any of the tools that have come (and gone!) over the decades, unless they're either mentioned in the same breath as Python itself, or get a mention by those "not first party but for all intents and purposes, totally first party" tools in logical contexts.

theacodes commented 5 years ago

Hello, I'm one of the two maintainers of Twine (along with @sigmavirus24). I'd personally be glad to see it rolled into pip publish and would be happy to write up a plan on how we'd do this. I'd be happy to spend more time on pip, although I admittedly have been having a hard time committing time to Twine.

I'm also totally cool with pip publish spitting out a helpful message while we figure this out.

sigmavirus24 commented 5 years ago

I'm also willing to keep maintaining "twine" within "pip" and eventually working more on pip. I know @dstufft has been hoping to trick me into becoming a pip maintainer for a while, so I'm sure he'll be happy to hear that 😜

dstufft commented 5 years ago

tenor

pganssle commented 5 years ago

Wouldn't moving twine inside of pip require that pip take on all of twine's dependencies?

That means twine would need to either vendor all their dependencies or pip and thus CPython would need to bring in a whole bunch of new dependencies. That seems less than desirable.

I think that is just the tip of the iceberg in terms of complexity added when you need your thing that installs packages to also upload packages, and I don't really see why they need to be combined. I don't think pacman should be able to upload packages to the arch repositories or the AUR. I don't think apt should be able to upload packages to Debian or Ubuntu's package repositories or PPAs.

If a pip publish command were implemented (and again I don't see why we need to do this, particularly because we are still in the process of telling people not to use setup.py upload...), I think it should be a thin wrapper around twine, which remains a separate package. pip publish would create an isolated build environment, install the latest version of twine in it, then pass the command through to twine. That would avoid most of the problems with combining pip and twine while also allowing people to use pip publish as an alias for twine.

njsmith commented 5 years ago

I'll let the pip/twine maintainers worry about the technical bits, but regarding this part:

I don't think pacman should be able to upload packages to the arch repositories or the AUR. I don't think apt should be able to upload packages to Debian or Ubuntu's package repositories or PPAs.

Those distributions use a model where packaging is an arcane art practiced by an elite group of wizards, not something ordinary people do. PyPI is not that. PyPI is a community index intended to be welcoming to everyone.

And, at a more mercenary level, it's to our benefit to make it easy for users to become maintainers, because hooboy could we use more maintainers.

dstufft commented 5 years ago

I would be -1 on a wrapper around twine where you have to install twine first. That just seems like a ton of extra complexity to allow people to invoke twine as pip publish instead of twine. If we're going to add pip publish then it should be part of pip.

And yes, we'd have to pull in twine's dependencies, which currently are:

pkginfo >= 1.4.2
- This replicates some code in pip, probably we should consider pulling this in as a dependency of pip anyways and remove the duplicated code.
readme_renderer >= 21.0
- This is the only dependency that is really hard. because it pulls in several dependencies itself, some of which are C extensions. Fortunately this is just for twine check, which I think would be reasonable to have as a dedicated utility ~forever since it's more of a linter than a packaging tool.
requests >= 2.5.0, != 2.15, != 2.16
- We already require requests, so this is basicaly a no-op.
requests-toolbelt >= 0.8.0
- A legitimate new dependency, but doesn't have any dependencies itself and is fairly small. There might even be things in here that would be useful for pip itself.
setuptools >= 0.7.0
- Twine only uses this for loading commands, folding twine into pip would involve getting rid of this.
tqdm >= 4.14
- Twine uses this for progress bars, pip already has progress bars so folding twine into pip would involve ditching this.

and optionally:

pyblake2 (on Python < 3.6).
- I think it would be fine to drop this dependency completely, and only have blake2 support on Python 3.6+, or have it as an optional dependency for older Pythons.
keyring
- We're already possibly going to add this dependency to pip anyways for better authentcation for download.

Really I just go back to what we think provides the better UX. Like we're probably never going to fold twine check into pip, it's dependencies are too heavy and it's getting too out of scope (IMO). So if people want to be able to lint their packages they're going to have to install another tool. Does it make sense to keep "producer" activities like uploading, building, linting, etc all focused into a singular tool, or does it make sense to roll the "mandatory" producer tooling into pip and spin out the more optional producer tooling into stand alone tools?

One question would be if there are other solutions that can fix the desire to roll tools into pip. Why is that better? Is it just because pip is installed by default? What if we made it easy to install those other tools by default? or even just installed them by default?

I don't honestly know the right answer to these questions! A lot of it is just someone who feels strongly about one path or another to figure out these answers.

pganssle commented 5 years ago

Those distributions use a model where packaging is an arcane art practiced by an elite group of wizards, not something ordinary people do. PyPI is not that. PyPI is a community index intended to be welcoming to everyone.

My point is that not everyone assumes that a package installer is an "everything box" that installs packages, builds them, uploads them, creates virtual environments, runs tests, runs your formatter, etc. As far as I can tell, that will only be the assumption for people coming from specific communities.

"pip is the tool that interacts with PyPI" is a reasonable mental model if we had built it that way. It's reasonably well-scoped, but that's not really how pip was developed and all our existing materials out there have one workflow for package installation and one workflow for publishing packages.

And the fact of the matter is, we really have to worry about churn. We're doing a very delicate and ambitious thing here, which is making radical changes to how packages are built and distributed fairly quickly. People are already very confused by when to use setup.py and when to use pip and when to use twine. Changing from "you should use twine" to "you should use pip publish" at this point will confuse huge numbers of existing people (plus all the people learning from tutorials based on the old best practice), for the benefit of new people who are used to a different code community that happens to use a different kind of build and packaging workflow. Those same communities also tend to have linting, testing and other related operations built into their "everything box", so now those people are still confused as to why it's "pytest" or "tox" instead of "pip test".

I think the best thing to do is to keep these tools well-scoped and if one or more people wants to build meta-tools that orchestrates them into one toolkit, we can link to the high quality ones from the packaging tutorials.

pganssle commented 5 years ago

requests >= 2.5.0, != 2.15, != 2.16 We already require requests, so this is basicaly a no-op.

Presumably twine needs to be refactored to use the vendored version of requests, no?

dstufft commented 5 years ago

There's basically no option here were we just plop twine inside pip and call it good. Rolling twine into pip would likely be pulling twine is as part of pip, not a dependency, and modifying it to fit with pip itself.

pfmoore commented 5 years ago

That means twine would need to either vendor all their dependencies or pip and thus CPython would need to bring in a whole bunch of new dependencies. That seems less than desirable.

Yep. IMO, this is the biggest question that should be asked when suggesting functionality gets added to pip. There's actually a lot to be said for an approach where we ship a "mini-pip" that has the bare minimum functionality necessary to bootstrap the packaging infrastructure (essentially, nothing more than the ability to install wheels, plus a plugin architecture that lets additional installs hook in new commands or extend existing ones.

Having said that, as @dstufft pointed out, adding twine's dependencies (specifically) isn't that big of a deal. (Probably because pip already has quite a lot of dependencies itself... ;-))

pganssle commented 5 years ago

I would be -1 on a wrapper around twine where you have to install twine first. That just seems like a ton of extra complexity to allow people to invoke twine as pip publish instead of twine. If we're going to add pip publish then it should be part of pip.

To be clear, my suggestion was that pip would do the twine installation for you, behind the scenes. When you invoke pip publish it would check if twine is the latest version (with some cache), then install the latest version into an isolated build environment that is saved for the next invocation. Essentially it's ensuretwine using the existing build isolation mechanism to avoid putting twine into your user's normal Python path.

You can also skip this whole thing if twine is already installed.

That said, I am just clarifying what I meant, I still stand by my "publishing doesn't really belong in pip" stance.

sigmavirus24 commented 5 years ago

Here are my very brief thoughts:

I think refactoring twine into pip is do-able.
I think random admin on the PyPA would be less likely to push their junk into Pip than into Twine just because they technically have commit on every repo
I think packaging is and always be one of Python's greatest warts. Every new person struggles with it at different points in time and I do think not having to think about pip, twine, flit, poetry, pipenv, etc. would make peoples lives easier. pip publish would be great.

There's one other option here that we're ignoring: Twine is working on a real darn-tooting API. If we wanted pip publish to work, Pip could be a consumer of that real life API. I think there are even potentially good divisions inside the code-base that Pip could leverage itself. That said, the API is still in progress. Feedback from the pipfolk would be very welcome.

pfmoore commented 5 years ago

Twine is working on a real darn-tooting API

That's another point of conflict with pip. Pip doesn't expose any sort of public API. It would be odd (that's the kindest term for it ;-)) if pip exposed a publishing API but nothing else, so merging twine into pip would lose that. Using twine as a library, with pip publish as a consumer of its API is a different question, and not one I feel particularly qualified to comment on without more research.

I do think not having to think about pip, twine, flit, poetry, pipenv, etc. would make peoples lives easier

Maybe. But the muddle of commands is at least in part due to a muddle of concepts - application, library, publishing, deployment, dependencies, requirements, pinning, ... I think that we need to start by getting our conceptual framework cleaner - otherwise, we're treating the symptoms rather than the cause.

Also, I have technical concerns about pip absorbing all of these roles. There are a lot of aspects of pip that reflect its somewhat unique constraints and background (vendoring requirements, lack of an API or a plugin architecture, ...). I'm not sure those features fit well with a modern, flexible package management command (however, I will admit that I have essentially zero experience in how other languages address these issues).

sirosen commented 5 years ago

I think that we need to start by getting our conceptual framework cleaner - otherwise, we're treating the symptoms rather than the cause.

💯 True.

After thinking more, I want to "take backsies" on a lot of what I've said in this thread. If installing python gave everyone pip and twine, the conversation would probably be much more centered around how to improve twine and its various "package publishing features" than whether or not to take the cosmetic, relatively insignificant step of coalescing the commands.

The concept of combining them into pip twine or pip publish is partly a proxy for the more essential "available everywhere (modern) by default".

The python stdlib provides unittest now, but pytest is still great. pip is a good installer, but pip-tools, pipenv, &co. will continue to exist and thrive.

I want a "unittest of package publishing" -- something unobjectionable, simple, good enough, and available in all modern environments by default. And maybe that mandates that there's some twine init wizard that helps you get started.

It can be invoked as twine or pip publish or pyproject-toml-manager.pl or whatever. I think the whole matter of naming and whether or not it should be part of pip is a (super tempting) bikeshed masquerading as the real issue.

For my own part, I don't even care how much I disagree with its "sane defaults". Even if I disagree with most of the decisions made by such a tool, I will still be extremely happy that it exists at all.

ncoghlan commented 5 years ago

TL;DR version of the below: +1 from me for a pip enable-publishing command that simplifies the account management and local environment management related steps in https://packaging.python.org/tutorials/packaging-projects/#uploading-the-distribution-archives, but I'm still -1 on offering pip publish itself.

Personally, I think it's incredibly bad practice to have software publication tools installed by default alongside a language runtime, and consider it a design mistake that the Python standard library currently includes support for the legacy distutils setup.py upload command (unfortunately, it's a major compatibility issue to get rid of it, and in the environments where folks care, they tend to just remove the entirety of the distutils module).

In addition to the Linux comparisons @pganssle already made above, on other systems, it would be akin to making Visual Studio a mandatory part of Windows installations, or XCode mandatory on OS X. The vast majority of Python users aren't going to be Python open source project publishers and that's OK.

Even for folks who are Python publishers, the majority of their Python installations still aren't going to be development systems or build servers, they're going to be runtime environments (which ideally wouldn't even have pip inside them, but there are currently logistical challenges to achieving that).

However, I think @pganssle is right that there are potential opportunities to take inspiration from the ensurepip module in the standard library: we use that to standardise the process of bootstrapping pip, without having to incorporate pip itself directly into the standard library.

If we were to go down that path, then the appropriate command to add at the pip layer would be something like pip enable-publishing that went through the following steps:

Installing twine and keyring into the current environment for you (as user installs if there's no venv active)
Prompting you for the repository you want to publish to (defaulting to test PyPI)
Prompting you for PyPI credentials as per https://github.com/pypa/twine#keyring-support (if they're not already in the keyring)
Checking the supplied credentials actually work
Offering to open the account registration page at https://pypi.org/account/register/ or https://test.pypi.org/account/register/ if the given credentials don't work (and the chosen repository is one of the ones PyPA manages)
If the supplied credentials do work, then save them to the keyring and print out a reminder that the standard upload command is twine upload dist/*

(The reminder at the end could potentially do a bit of filesystem introspection, and decide what to emit based on whether it finds a dist directory, pyproject.toml, setup.py, or none of the above)

The general idea would be that the account setup related parts of https://packaging.python.org/tutorials/packaging-projects/#uploading-the-distribution-archives would largely be replaced by "Run pip enable-publishing".

(Writing this up also made me realise one of the big reasons why npm is able to do this differently: there's a much stronger distinction in that ecosystem between the development runtimes used to emit minified JavaScript, and the browser and in-app runtimes that execute them, which means it's far less likely that npm will end up installed into a runtime environment by default)

sirosen commented 5 years ago

Thanks for stating that last point about npm (the parens make me think you considered omitting it) -- I think it's a very relevant. Whether or not we agree on what should be done in python, it clarifies "this is why npm is so different".

pip enable-publishing is fine as proposed. It would be better than where we are now. But I want more. (I'm a needy package publisher! 😛 )

In addition to the Linux comparisons @pganssle already made above, on other systems, it would be akin to making Visual Studio a mandatory part of Windows installations, or XCode mandatory on OS X. The vast majority of Python users aren't going to be Python open source project publishers and that's OK.

I don't see why this would be a bad thing. What are we asking non-publishers and server runtimes to give up in order to have this, other than disk space?

It's a huge quality of life improvement for publishers, at little expense to anyone else.

If I had apt make-deb and yum make-rpm I'd be happy, and what would Ubuntu desktop users lose?

Personally, I think it's incredibly bad practice to have software publication tools installed by default alongside a language runtime

Why is that? Comparison with Rust + Cargo is relevant here. Has Rust made a mistake including Cargo by default, or is there some circumstance that makes python different?

The desire for a minimal standard lib sometimes will run hard into having a rich, batteries-included standard lib. If the standard lib is going to include mock, 2to3, and xmlrpc, why is twine so different? Perhaps entering dangerous waters, but if venv and zipapp exist, shouldn't twine or ensuretwine be there too?

dstufft commented 5 years ago

I think I am -1 any sort of short cut for pip install twine keyring or whatever. That feels like the kind of "magic" that will add further confusion to people when they don't fully understand what that short cut is doing.

I view pip as a more developer centric tool than an MSI installer or apt-get or yum. Long term random end users should probably not be pip installing things, but should be installing "distribution aware" packages. IOW our toolchains should be improved so we can produce MSIs, .deb, etc as a matter of course. Likely ones that include Python themselves as part of it, which utilize internally pip or the like in part of the build toolchain.

Of course there are always going to be cases where the above isn't the right answer, e.g. installing a requirements.txt into a Heroku environment.

That ultimately ends up being exactly the same case as npm is in. If you're installing just plain old Python, you're going to get pip installed just like if you're installing plain old Node.js you're going to get npm installed. If you're producing something like a bundled app, then your bundled app is unlikely going to include pip.

Ultimately, I think trying to come up with rationalizations for for why it "belongs" in pip or "belongs" preinstalled or why it doesn't the wrong way to think of it. Plenty of people argued bundling pip with Python was the "wrong" thing to do for a variety of reasons, and they were, IMO, wrong at the time. What I think matters if what we think provides the best experience instead of trying to be hardline about some idealistic vague "rules" about where stuff "belongs".

That being said, I don't know if merging the tools is the "right" thing. It would represent trade offs at the sub command level instead of trade offs at the top level command level. One tiny example, currently I have a single version of twine installed but many versions of pip installed, bundling them would make it harder for me to ensure I'm using the latest version of the publishing tools, since I have to update them in every virtual environment instead of just once at the top level. Obviously merging the two tools provides some benefits as well, since it makes it easier for users to know what tool they need to use, but I don't think it's a slam dunk and there may be ways (like also pre-installing twine) that we can use to negate the downsides of two commands, while still getting the upside. Or maybe just merging the two commands is really what's easiest for users. I dunno! I've been steeped in packaging lore for so long it's hard for me to step back.

pganssle commented 5 years ago

Comparison with Rust + Cargo is relevant here. Has Rust made a mistake including Cargo by default, or is there some circumstance that makes python different?

Rust has no language runtime, it produces native binaries, so the distinction is even stronger there between tools for users and tools for producers. Though I am less sure than Nick about the degree to which that makes the difference, it's undeniably true that users of cargo are definitely producing software projects. Users of pip may be doing many other things, installing software, creating an interactive console, etc.

pfmoore commented 5 years ago

Comparison with Rust + Cargo is relevant here. Has Rust made a mistake including Cargo by default, or is there some circumstance that makes python different?

Experience with distutils says that putting publishing tools into the standard library (where they can't change rapidly in response to changes in the publishing ecosystem) results in problems. The situation isn't necessarily the same here, and it's possible that people are over cautious because of the history, but conversely, the benefits are small.

Putting pip into the standard library was a huge step, because it essentially solved the bootstrapping issue of how to get other tools. But with pip, getting a publishing toolchain is nothing more than pip install -r standard-publishing-toolchain.txt. Sure, having tools built in is better (I'm normally an opponent of the "it's easy to just install stuff" argument) but is the benefit sufficient to justify the risks?

Regarding comparison with Rust, I see that @pganssle has already made some comments. Another factor to consider is that Python package producers routinely interact with non-Python tools (C compilers and libraries, tools like Cython, ...) As far as I know, Rust doesn't have to do that - and so Cargo doesn't have to react as those tools change. So yes, Python is different from Rust/Cargo.

Pomax commented 5 years ago

I have to say I don't understand this part:

Personally, I think it's incredibly bad practice to have software publication tools installed by default alongside a language runtime, and consider it a design mistake that the Python standard library currently includes support for the legacy distutils setup.py upload command (unfortunately, it's a major compatibility issue to get rid of it, and in the environments where folks care, they tend to just remove the entirety of the distutils module).

While I share the opinion that legacy support for something like distutils should never have happened, I don't see the connection between the fact that that support is there, and the notion that it's somehow bad practice to have the publication tools bundled with the language suite. Those are two completely separate things, and I'd like to understand what makes you think it's bad practice to offer those tools as part of the standard library.

I'd also caution against drawing parallels between pip and apt/yum. In part because they're only similar on the surface, in the sense that they're might all fit the "installation managers" label while differing substantially in context, but also in large part because Python is a cross-platform language: discussions about its package manager that require drawing parallels should draw parallels to other cross-platform programming language package managers, not to OS-specific installation managers (which gets even worse in the case of apt or yum, which aren't even Linux-specific, but "only certain flavours of Linux"-specific).

So that means comparing pip, as a programming language dependency manager, to other such tools like cargo or npm. These tools of course have the benefit of being very new tools indeed, so there are lessons to be learned from the decisions they made after looking at what people want out of these tools, and what they actually get out of these tools, looking at all the languages that came before them, including how Python has handled package management. As it turns out, truly making these tools the package manager, not just the package installer (with up/downgrades just a more specific form of install), and having them be part of the default installation greatly benefits everyone.

So I'd like to understand comments around why it would be a bad thing to (in limited fashion from what I'm reading so far) effect this kind of empowerment for users of Python. The added disk space adds up to "basically irrelevant" except for embedded systems (where no sane person would use a standard install anyway), and it sounds like the maintainers of twine are up for folding its functionality into pip, so this all sounds like great stuff, and I still really hope to see a fully functional pip publish come with a near-future version of Python, ideally with an interim solution in the very next version where pip publish either tells people what to do, or asks them whether it should bootstrap everything for the user to, with minimal additional work, get that code pushed up and available to the rest of the world for use and collaboration.

ncoghlan commented 5 years ago

(Note: thinking out loud in this comment so folks can see where I'm coming from in relation to this. I'll post a second comment describing a version of pip publish that would address all my technical design concerns without the UX awkwardness of pip enable-publishing)

The root of the design problem we face at the pip level actually lives at the Python interpreter level: unlike other ecosystems, we don't make a clear distinction between "development environments" and "runtime environments".

C/C++:

runtime is just libc
distribution is of the built binaries, not the original source code
development environments add a compiler, debugger, etc (accessed either as CLI tools or via an IDE)

Rust:

essentially the same set up as C/C++, except with proper structured dependency management in cargo

Java:

runtime is just a JVM bytecode interpreter
distribution is of the built JAR and WAR files, not the original source code
development environments add javac, maven, gradle, etc (accessed either as CLI tools or via an IDE)

JavaScript:

runtime is Node.js or a browser JS engine with a built-in source compiler
distribution is of either the output of a JS build pipeline, or else of a directory with node_modules embedded in it
browser debuggers live in the client browser as an optional add-on
Node.js debuggers live outside the interpreter and communicate over a defined WebSocket protocol

Python:

runtime is, for the purpose we're considering here, CPython, as PyPy typically doesn't get installed by beginners, and a lot of pip-installable things won't work out of the box with MicroPython anyway
CPython not only comes with a source compiler built-in, it also comes with a native debugger, a legacy build management system that we're looking to deprecate (distutils: https://github.com/pypa/packaging-problems/issues/127), and a modern package installer that's more deliberately designed to be optional (pip/ensurepip).
distribution is a wild and wooly mess of different technologies adopted at different times (https://packaging.python.org/overview/)

So, in writing that out, I think my main concern is actually a factoring problem, in that I'd like users to be able to easily decide for themselves whether they want to configure a system as:

A Python runtime system: no pip, no wheel, no twine, no setuptools, no distutils (the first 3 of those are readily achievable today, the latter two are still a work in progress)
A Python application build & deployment system: able to consume from Python packaging repositories, but not set up to publish back to them (this is all pipenv needs, for example, along with any other pipeline that converts Python packages to a different packaging ecosystem, whether that's a Linux distro, conda, etc)
A Python library build & deployment system: both consumes from and publishes back to Python packaging repositories

Point 1 is handled at the standard library level with ensurepip (once we figure out the thorny mess of having the distutils API be provided by setuptools instead of the standard library)

That means it's only points 2 & 3 that impact the design of a pip publish command. Saying "we don't care about the concerns of folks that want to minimise the attack surface of their build pipelines" is certainly an option, but I don't think it's a choice that needs to be made (hence the design in the next comment).

ncoghlan commented 5 years ago

I realised there's a way of tackling pip publish that would actually address all my design concerns:

pip would declare a publish extra, such that running pip install --upgrade pip[publish] instead of pip install --upgrade pip installed any extra dependencies needed to make pip publish work. (Declaring an extra this way covers points 2 & 3 in my previous comment)
pip publish would be implemented using the in-progress Twine API @sigmavirus24 mentioned in https://github.com/pypa/packaging-problems/issues/60#issuecomment-447107759 (and presumably influence the design of that API)
pip publish would prompt to auto-install the pip[publish] extra if it found any of its import dependencies missing
With that approach, the PyPI registration helper functionality would likely make the most sense as an addition to twine, so it could be iterated on outside the pip release cycle

From an end-user perspective, that would all end up looking like an "on first use" configuration experience for the pip publish command (which is always going to exist due to the need to set up PyPI credentials on the machine).

From a maintenance perspective, while the existence of twine as a support library would become a hidden implementation detail, the twine maintainers would likely still need to become pip maintainers as well, so they can handle pip publish issue reports, and bump the minimum twine version requirement as needed (similar to the way @dstufft originally became a CPython core dev primarily to handle updating the bundled pip to new versions).

As an added bonus, all the documentation about extras would gain a concrete example that it can point to: pip[publish] :)

njsmith commented 5 years ago

Is attack surface your main concern? Because python already ships with multiple ways to make arbitrary HTTP requests, and doesn't ship with any PyPI credentials. So I'm having trouble seeing how having twine available would increase attack surface in a meaningful way? What's your threat model?

ncoghlan commented 5 years ago

@njsmith Every other part of pip can interact with PyPI anonymously, but upload needs the ability to work with the user's PyPI credentials.

Not putting the credentials on the machine in the first place is obviously the primary defence against compromise, but if the code is useless without credentials, why have it there, instead of designing the tooling to add the dependencies at the same time as you add the credentials?

Keeping the dependencies separate also means that if a CVE is raised against the way twine accesses the system keyring, or the way it interacts with a user's account on PyPI, then it's only a vulnerability on systems that have twine installed, not on all systems that have pip installed. (A future version of pip would presumably raise the minimum required version of twine to a version without the vulnerability, but that would be a matter of dependency management hygiene, rather than urgent CVE response)

That said, laying out the considerations as I did above means I now think most of the cases where this kind of concern really matters will be ones where the feature to be removed from the deployment environment is the entire build and installation toolchain, and that's already possible by doing pip uninstall pip wheel setuptools once the target venv has been set up (getting rid of distutils is more difficult, but still possible).

So while I think the "extra"-based approach would be architecturally clearer (i.e. pip primarily remains an installation tool, but has some core publication functionality that relies on some optional dependencies), I don't think having it baked into the default install would create any unsolvable problems - worst case is that it would just give some folks an increased incentive to figure out how to remove pip from their deployment artifacts entirely, and there might end up being some future CVEs that impact more Python installs than they otherwise would have.

pradyunsg commented 5 years ago

I like @ncoghlan's idea -- have a pip command that's providing (optional) upload functionality, implemented using twine's public API, with an extra in pip to install the dependencies for it. :)

ArjunDandagi commented 4 years ago

as someone who tried Pip , Gem, Brew , and NPM I must say . Npm is the easiest of all the packaging tools . 😄

Pomax commented 4 years ago

It's been over 5 years since this issue got filed, and almost 2 years since the discussion died down and nothing happened. However, the entire world would still benefit from being able to type pip publish, because publishing a package is still ridiculously hard in this ecosystem.

Just pick an approach, implement it, and then iterate on refining or even wholesale changing that implementation as the command sees adoption. As long as pip publish works at all, improving how it works can be a rolling target.

astrojuanlu commented 4 years ago

If nobody has complained about this in 2 years maybe it's not that crucial.
That last comment left 2 years ago actually expresses agreement on a way forward. What about sending a pull request instead of demanding free labour?
In 2020 neither flit publish nor twine upload are "ridiculously hard" by any standards, and if they are perceived as such it's a documentation issue, not a tooling issue.

hoechenberger commented 4 years ago

@astrojuanlu

If nobody has complained about this in 2 years maybe it's not that crucial.

Why should one constantly add complaints if there's an issue open already? I guess only few would agree that Python packaging tooling is a pleasant thing to use. Besides, there are complaints now, and you're complaining about those. So maybe you should make up your mind on this matter.

In 2020 neither flit publish nor twine upload are "ridiculously hard" by any standards, and if they are perceived as such it's a documentation issue, not a tooling issue.

Oh come on, the tooling is really not great compared to what we're seeing e.g. with NPM. Nobody's saying that the pip / PyPA team hasn't been doing an amazing job, but in comparison to other ecosystems, Python is just so far behind.

pfmoore commented 4 years ago

Oh come on, the tooling is really not great compared to what we're seeing e.g. with NPM. Nobody's saying that the pip / PyPA team hasn't been doing an amazing job, but in comparison to other ecosystems, Python is just so far behind.

How many people work on and support npm? Wikipedia says "The company behind the npm software is npm, Inc, based in Oakland, California. [...] GitHub announced in March 2020 it is acquiring npm, Inc". The pip development team consists in total of about 5 people, all of whom only work on pip in their spare time. Frankly, I'd hope npm would be better than pip, with that level of disparity in development resource...

layday commented 4 years ago

Most of the work in the Python packaging space appears to be - with the sole exception of the new dependency resolver - unfunded and is carried out by volunteers in their free time. npm was VC-funded as early as 2013 and is now maintained by GitHub.

Edit: heh, we posted almost the exact same thing at the exact same time.

hoechenberger commented 4 years ago

Yes, I don't challenge that. This is a totally acceptable explanation for why Python packaging is in such a bad shape. But still one should acknowledge that Python packaging is not great by any standards. Why that is the case is a different question. I'm thankful for the work people have put into the existing ecosystem either way, but this doesn't mean one cannot dislike or criticize it.

pfmoore commented 4 years ago

That's a very absolute statement. There are certainly some standards by which Python packaging is fine:

It's fine for something which is used by millions and maintained by less than 10 volunteers
It's fine compared to the state it was in 10 years ago
More personally, it's fine for what I use it for

Progress is slow. But it's not non-existent. And there are reasons why it's slow. People complaining that the volunteer labour "doesn't get things done faster" is one of the reasons it's slow, because it discourages and burns out the people whose freely given efforts are being dismissed as insufficient. I speak from experience here, as I know I'd do far more on pip if I didn't have to review so many issues that left me feeling demotivated.

this doesn't mean one cannot dislike or criticize it

However, finding ways to express such a dissatisfaction without implying some level of failure on the part of the people who voluntarily give their time to the work, is very hard. And people typically don't make any effort to do that, but simply throw out criticisms, and then follow up with "yes, but I appreciate the work people have done, I just dislike the result".

And furthermore, how is complaining and criticising without offering any help, productive? If you were to submit a PR implementing a pip publish command that took the discussion so far into account, your views would be much more relevant and welcome. But just out of the blue commenting that "this sucks" isn't really much help in moving the issue forward.

Never mind. I don't want to spend my Sunday worrying about explaining this to people. I'll go and find something more enjoyable to do. (And if that means I don't work on pip today, that's a good example of the consequences of this sort of discussion).

pradyunsg commented 4 years ago

And if that means I don't work on pip today, that's a good example of the consequences of this sort of discussion

The fact that this was the first notification/issue thread I've read on this Sunday, is directly the cause of why I'm not spending any more time today to work on pip.

hoechenberger commented 4 years ago

@pfmoore

Progress is slow. But it's not non-existent.

Nobody said that.

People complaining that the volunteer labour "doesn't get things done faster" is one of the reasons it's slow, because it discourages and burns out the people whose freely given efforts are being dismissed as insufficient. I speak from experience here, as I know I'd do far more on pip if I didn't have to review so many issues that left me feeling demotivated.

this doesn't mean one cannot dislike or criticize it

However, finding ways to express such a dissatisfaction without implying some level of failure on the part of the people who voluntarily give their time to the work, is very hard. And people typically don't make any effort to do that, but simply throw out criticisms, and then follow up with "yes, but I appreciate the work people have done, I just dislike the result".

I can very much empathize, I've been in your shoes before, many times. Maybe to clarify once more: I greatly appreciate the work and effort that people have put into PyPA and pip. But I think it's not okay to simply deny there are still many issues to be resolved when there clearly are issues. Because my impression was that this is exactly what was happening in response to @ArjunDandagi's and @Pomax's comments (and is the only reason why I joined the discussion)

And furthermore, how is complaining and criticising without offering any help, productive? If you were to submit a PR implementing a pip publish command that took the discussion so far into account, your views would be much more relevant and welcome. But just out of the blue commenting that "this sucks" isn't really much help in moving the issue forward.

First off, I never said "it sucks". Secondly, I believe it's a mistake to only allow criticism if the one criticising has a solution for their problem right at hand. One must be able to express dissatisfaction even if one doesn't know how to resolve the problem.

Never mind. I don't want to spend my Sunday worrying about explaining this to people. I'll go and find something more enjoyable to do. (And if that means I don't work on pip today, that's a good example of the consequences of this sort of discussion).

@pradyunsg

+1

The fact that this was the first notification/issue thread I've read on this Sunday, is directly the cause of why I'm not spending any more time today to work on pip.

You can spend your time however you want to. Nobody's forcing you to do anything.

takluyver commented 4 years ago

If you were to submit a PR implementing a pip publish command that took the discussion so far into account, your views would be much more relevant and welcome.

Just to add, as a maintainer of various open source projects (not pip), a PR like this is probably not as helpful as it initially sounds. If you're not familiar with the internals of a project, your first attempt at writing a significant new feature is likely to need a lot of work, and therefore take up a lot of reviewers' time. It can also cost a lot of mental & emotional energy to explain to a well-intentioned contributor that the changes they've spent hours or days on are not going to be merged, and at least for me, this really drains my enthusiasm to work on a project.

So, before contributing pip publish (or any other significant change to an open source project), it's a good idea to work out:

Is this something the project maintainers want to add? (E.g. the pip maintainers may say pip is for installing packages, not creating & publishing them - though I get the impression they're actually somewhat open to doing both)
Has anyone else attempted this? Where did they get stuck? Is their code a good starting point?
How do the maintainers envisage it working & fitting in to the project? Where would they start if they were implementing it? What's a sensible first step to submit and have reviewed?

@hoechenberger

I believe it's a mistake to only allow criticism if the one criticising has a solution for their problem right at hand

You are allowed to criticise. @pfmoore suggested that it was not productive for you to do so. It looks like you've contributed to dissuading two maintainers from spending time on pip today, so I'd have to agree with him.

The issue is that criticising Python packaging has been done to death for years. Anyone involved in Python packaging knows there are still plenty of warts and areas for improvement. So another round of "why isn't this fixed yet?" without engaging with the details of the discussion is not actually driving anything forwards.

I will endeavour to resist the urge to reply again for at least the rest of the day.

nicoddemus commented 4 years ago

Progress is slow. But it's not non-existent. And there are reasons why it's slow. People complaining that the volunteer labour "doesn't get things done faster" is one of the reasons it's slow, because it discourages and burns out the people whose freely given efforts are being dismissed as insufficient. I speak from experience here, as I know I'd do far more on pip if I didn't have to review so many issues that left me feeling demotivated.

As a maintainer of a project used by many (pytest), I definitely concur with this statement.

merwok commented 4 years ago

It would really help if people made concrete notes about what is not good in Python tools, and what is great in other tools.

Pomax commented 4 years ago

That would be the comment that started this thread. In npm land, which is probably the best publishing experience, there is one tool, and just one tool, and the mirror-sequence of steps is:

Register on NPM: this has to be done through the website
Login to NPM: one-time npm login, until you uninstall NPM or reformat your machine or the like
Build distributions: there is no true "publishing-specific" building of distributions. Instead:
- there is a one-time npm init to set up the publishing metadata, with a guided CLI experience that asks you for all the information required
- there are dedicated npm version major, npm version minor, and npm version patch commands to make "sticking to semver" as easy as possible for maintainers, which makes it possible for people who use npm packages to trust that patch/minor changes don't break a codebase when uplifting (at least, to the degree where any package that breaks that trust is a genuine surprise).
- there is no separate building/packing step, you just need to make sure your current state is synced to your repo (code + tags).
Upload distributions: npm publish. This command packs up your local files, with optional exclusions through either .gitignore or .npmignore, but that archive only exists in memory, for as long as it needs in order to be uploaded.

This is essentially frictionless, through a single tool. Yes, a competitor was written to address NPM's slowness, called "yarn": but they quite wisely decided to make it work in exactly the same way, so if you're coming to Python from the Node ecosystem at least (or if you're a long time user of Python and you started working with Node), you are effectively spoiled with an excellent publishing flow and tooling for that.

There were dicsussions around having pip "wrap" other tools, so that it could at least act as front-end for the publishing workflow and people would only need the one command: that would still be amazing. Sure, it would end up preferring one tool over another, but that's fine: folks who don't want to have to care, don't have to care, and folks who do care don't need to change their publishing flow and can keep using their preferred tools for (part of) the release chain as before.

pfmoore commented 4 years ago

There were dicsussions around having pip "wrap" other tools, so that it could at least act as front-end for the publishing workflow and people would only need the one command: that would still be amazing.

One suggestion - not intended as deflection, but as a genuine way for community members to help explore the design and maybe tease out some of the inevitable difficulties in integrating existing tools in such a front end. Maybe someone could build a tool that acted as nothing but that front end - providing the sort of user interface and workflow that node users find so attractive (I've never used node myself, so I have no feel for how npm "feels" in practice), while simply calling existing tools such as pip, twine etc, to deliver the actual functionality.

If we had such a frontend - even in the form of a working prototype - it would be a lot easier to iterate on the design, and then, once the structure of the UI has been sorted out to work in the context of Python, we could look at how (or maybe if) we would integrate the command structure into pip or whatever.

layday commented 4 years ago

I think it is important to recognise that these complaints pertain to setuptools. Working with Flit and Poetry, which provide their own CLI, is not unlike working with npm. The addition of a pip publish command will not meaningfully improve the situation with setuptools - not least because pip does not have an accompanying build command (there is a wheel command but that only builds.... wheels) and neither does setuptools (the setuptools CLI is deprecated and slated for removal). There is work being done in this area but it is slow both for historical reasons and for lack of resources. python-build is a generic build tool which - as I understand it - will be adopted by pip once stable and blessed by setuptools. There has been discussion on improving the setuptools documentation and on adopting pyproject.toml. There are several PEPs under discussion which seek to standardise on how to specify package metadata in pyproject.toml. These are all things that open up new possibilities for pip and for other tools, like the hypothetical integrated frontend that @pfmoore has mentioned above.

Pomax commented 4 years ago

I think it's probably also worth noting that a (small?) elephant in the room is that if you're coming to Python from another language, or even if it's your first exposure, you get told by nearly everyone that "how you install things" is through pip. So even if in reality it's just one of many ways to install/publish packages, and some of the alternatives are quite streamlined, that's not what people are being taught to think of pip as. It's essentially "python and pip" in the same vein as "node and npm" or "rust and cargo" etc. That's not something anyone can even pretend to have any control over at this point, of course, but it's a strong factor in what people new to Python, or even folks "familiar enough with python to use it on a daily basis alongside other languages" have been implicitly conditioned to expect from pip.

Having someone write a "unified" CLI proof of concept tool sounds like a great idea, and I'd be more than happy to provide input around the "npm experience" (having published a fair number of packages there), although I would not consider myself familiar enough with the various tools (or with enough time to deep-dive) to write that PoC myself.

pganssle commented 4 years ago

We had basically these same arguments about adding a pip build command. I am personally way more in favor of @pfmoore's idea (and it's one I've suggested beofre) of having a top-level tool that wrangles all the various other tools for you. There's a bunch of complications that come with cramming a bunch of different tools (publisher, builder, installer) into a single common tool.

For example, the motivation behind the unix philosophy does-one-thing-well tools for building distributions and installing wheels is that many downstream distributors feel the need to bootstrap their whole builds from source, and it's a pain in the ass to bootstrap a swiss army knife monolith like pip compared to small, targeted tools.

I also think that it's easy to look at cargo and npm and think that they have everything figured out, but these big monolithic tools can be a real problem when because of poor separation of concerns, nominally unrelated aspects of them become tightly coupled. I know a few places where we've had problems because cargo didn't (doesn't?) support any endpoint other than crates.io, and in general they are also in the early phase, when there's still a lot to build out, and not necessarily a lot where they are suddenly constrained from making any big changes.

I'm not saying that those ecosystems and all-in-one tools are worse than what we have or even that there's no benefits to them, but in the past we had an all-in-one tool for this: distutils. setup.py was extensible, had a bunch of built-in stuff for building, testing, installation, etc. Over time bitrot, tight coupling and poorly defined interfaces have made it mostly a minefield, and on top of that it's just fundamentally incompatible with how software development works these days.

I think a bunch of individual tools with one or more wrapper CLIs for various purposes makes a lot of those problems much more tractable in the long term, and might help the people clamoring for a "single endpoint".

pfmoore commented 4 years ago

Having someone write a "unified" CLI proof of concept tool sounds like a great idea, and I'd be more than happy to provide input around the "npm experience" (having published a fair number of packages there), although I would not consider myself familiar enough with the various tools (or with enough time to deep-dive) to write that PoC myself.

To be honest, no-one had that sort of familiarity with the tools/ecosystem when they started. Why not just write a gross hack and see how things develop from there?

mypip.py

import subprocess
import sys

if __name__ == "__main__":
    if sys.argv[1] == "publish":
        subprocess.run(["twine", "upload"] + sys.argv[2:]
    else:
        subprocess.run(["pip"] + sys.argv[1:])

In all seriousness, that's the bare bones of a utility that adds a "publish" command to pip's CLI. Clearly, there's a lot of work to make even a prototype out of this, but if you started with that and actually used it, and improved it as you hit annoyances/rough edges, you'd pretty soon end up with something worth sharing. Most of the tools I've ever written started out like this.

(I'm not trying to insist that you do this - just pointing out that "I don't know enough" is actually far less of a hurdle than people fear).

Previous Next