pypa / packaging.python.org

Python Packaging User Guide
http://packaging.python.org
1.42k stars 914 forks source link

Overview implies only uploading sdists for pure Python packages #539

Open dstufft opened 6 years ago

dstufft commented 6 years ago

The recently added overview (which is overall pretty great), implies that for Pure Python packages, you only want to upload a sdist. While it's generally true that some of the wins are most obvious with packages that contain compiled code, pure python projects should pretty much always upload wheels as well (and for them, it's generally easier, and often times a single wheel file is all they need).

As we progress packaging forward, one thing that is for certain is that we know that we will someday be able to have PyPI present information like Requires-Dist for wheel files, allowing pip and such to resolve dependencies without downloading the files and executing a Python subprocess. We don't know that we are going to be able to do that for sdists (though of course we hope to). Projects that don't upload wheels risk, in this likely future, from forcing installers to rely on much slower mechanisms for dependency resolution.

Of course Wheels are also just generally faster to install, even for pure python projects the speed up can be 1-2s per package, which can add up greatly over even a fairly modest dependency set (30 depencies means 30-60s more time if they don't provide wheels).

This kind of throws this section of the overview off a bit, since the entire premise of this section is the gradient from single .py files, to pure python packages, to packages with compiled artifacts. That gradient is probably worth while to express, but it should have some distinction other than sdist vs wheels, because you should produce both packages, for both layers of the gradient (omitting plain .py files for obvious reasons).

Perhaps a better way to describe this gradient would be indicating the tools you use to build it? You can use something simpler like flit for pure python, whereas if you're doing something compiled you're likely going to need setuptools or numpy.distutils or something.

pradyunsg commented 6 years ago

It might be a good idea to rename the sections to say something instead of source/binary here.

I can't really come up with any good alternative names here, other than possibly Python-only package distributions for the former.

pfmoore commented 6 years ago

Strong +1 to this. The speed up for wheels is likely even greater on Windows, where creating subprocesses (i.e. running setuptools) is costly. And when pip implements PEP 517, that has an even higher cost in terms of subprocesses, as every PEP 517 hook call must be isolated into its own subprocess call.

So long story short, we should be strongly encouraging projects to upload wheels - especially pure Python projects, where the cost to the developer is pretty trivial, and the benefit to the user is high.

mahmoud commented 6 years ago

Here is what the Overview currently says:

"Default to publishing both sdist and wheel archives together, unless you’re creating artifacts for a very specific use case where you know the recipient only needs one or the other."

As well as:

"Python and PyPI make it easy to upload both wheels and sdists together."

If more clarification is necessary, I would strongly suggest linking to good resources as opposed to simply adding more to these sections. The library-focused part of the overview is already bulging, and on the verge of losing its summary/overview focus, especially given how heavily focused the rest of the guide is on library packaging.

This also means recognizing that the overview is not PyPI best-practice specific. There are a lot of flows where you don't even publish to a PyPI, private or otherwise. For instance, at my last job, we had a build pipeline that included building wheels only, no sdist.

I'll repeat what I said in the original PR discussion: the overview is not a tutorial, and no one will end up shipping anything using this document alone. It's meant to clarify capabilities of each technology, and point outwards to useful resources.

dstufft commented 6 years ago

While I agree that the overview shouldn't be a tutorial, but the current sentiment in that document feels very misleading to me. The way it currently reads it feels to me like someone is going to read this, then get what appears to be conflicting advice from the actual tutorials, and end up feeling like packaging.python.org isn't internally consistent in what it is telling them, and confused about what they should be doing.

While I see in the original PR you added some text about how you should publish both sdist and wheels, that text was added under the heading that goes into details about compiled software. If I'm a user looking at this overview looking for more information on publishing pure python projects, I'm likely going to completely skp over the section on compiled software, since it has nothing to personally do with me (in this hypothetical case).

The image in this section further adds to this confusion I feel:

gradient

Where a user looking at this image, their prime take away is going to be that you use sdists for pure Python software, and wheels for "other" kinds of Python packages.

Part of the problem I have with this section, to me, you only get accurate advice and descriptions if you actually read the entire section word for word, top to bottom. However, that's not to my knowledge how people read documents on the web. For instance, this research which finds that users don't read web pages, but rather scan through them looking for key words or headers that stand out to them.

Consider that this was my very first reading of this document, and I care more than your average Python developer about packaging stuff, and my initial reaction was that this document was telling me that sdists are for Pure python, and wheels are for compiled software. You could argue that someone not as steeped in packaging lore is going to take more time and care to read every word, and thus are less likely to miss the single sentence buried in the middle of the "compiled stuff" section that contradicts the impression the rest of the document gives... but I think that is unlikely and goes against the research in this area?

Overall, like I mentioned earlier, I think that the intent behind this section is good, but I think the conflating these tiers of "random .py file", "pure python project", "complex project with lots of build tooling required" with YOLO, sdist, and wheel makes it harder for users, because the next document they read is likely going to give advice that is likely going to be conflicting with the take away they had with that document.

dstufft commented 6 years ago

I also noticed that the section about pure-python package links to https://docs.python.org/3/distutils/sourcedist.html, which of course doesn't mention wheel at all because it's part of the standard library, so wheel doesn't exist, and because it's a dedicated document for producing sdists.

The section concludes with:

If you rely on any non-Python code, or non-Python packages (such as libxml2 in the case of lxml, or BLAS libraries in the case of numpy), you will need to use the format detailed in the next section, which also has many advantages for pure-Python libraries.

Which sends a pretty strong signal to users to just go ahead and skip this next section if you don't need to do anything with C library, or Fortran or similar, which is further reinforced by the next section starting out with:

So much of Python’s practical power comes from its ability to integrate with the software ecosystem, in particular libraries written in C, C++, Fortran, Rust, and other languages.

Which again sends a pretty strong signal that if the reader isn't dealing with C, C++, Fortran, Rust, or any non Python language, they can just go ahead and skip this whole section.

I also think that the statement:

Default to publishing both sdist and wheel archives together, unless you’re creating artifacts for a very specific use case where you know the recipient only needs one or the other.

Does not particularly help here, because when you look at the full context of that statement:

Binary distributions are best when they come with source distributions to match. Even if you don’t upload wheels of your code for every operating system, by uploading the sdist, you’re enabling users of other platforms to still build it for themselves. Default to publishing both sdist and wheel archives together, unless you’re creating artifacts for a very specific use case where you know the recipient only needs one or the other.

Appears to almost entirely be talking about compiled software again.

The only mention that I can find that actually appears to be talking about pure-python software and that publishing wheels is a thing you should do with them, is:

Not all developers have the right tools or experiences to build these components written in these compiled languages, so Python created the wheel, a package format designed to ship libraries with compiled artifacts. In fact, Python’s package installer, pip, always prefers wheels because installation is always faster, so even pure-Python packages work better with wheels.

Which does explicitly mention that pure-Python packages work better with wheels. But I don't feel like this single sentence is enough, because it's buried solidly in a section that appears to be targeting people looking to produce compiled software.

So basically, I think the flow that someone who doesn't know what they're doing is going to do if they come across this document is:

  1. Scan down the page or use the links to find the "Packaging Python libraries and tools" (because they don't know what a source vs binary distribution is for Python).
  2. Start reading, find the section on source distributions.
  3. Read until they hit the section saying that the next section is for "compiled sofware like libxml2 or BLAS".
  4. Stop reading, go back to the link to how to create a sdist.
  5. Scan that for the commands to run, run them.
  6. Exit? Start searching next for guides on how to upload? I don't know!
mahmoud commented 6 years ago

I enjoyed that Nielsen article. Then I noticed it was written in 1997, which is almost a bigger deal than the fact that it doesn't address a technical audience. In any case, its main takeaways are that conciseness, scannability, and outbound links are critical.

I myself wish there was a better place to link to about sdists that dovetailed into a recommendation re: wheels. I would love to skinny up those sections and feature a better link.

Furthermore, I would contest your characterization of the average Python reader. My experience answering questions on the original essay and talks are that people prefer to go maximal. They are likely to default to wheel because it's the biggest, newest, and most definitive.

And just to put a point on it, the vast majority of people who end up on these documents do know something about what they're doing. They've created something that is shippable, and they've got the foresight to go looking for docs. All that puts them in at least the early-middle of their software craft.

Back to the content, the main thing the document is trying to highlight is that you can depend only on stdlib to do sdists. You don't even need 3rd party packages. That's a big feature in my experience.

That said, I'm fine if the tone shifts to something more like "while sdists are technically sufficient for pure-Python, wheels are even better in most cases". But that said, promotional language is seen as a barrier according to 1997 Nielsen ;)

pfmoore commented 6 years ago

This also means recognizing that the overview is not PyPI best-practice specific.

Why do you feel that's the case? Certainly there are many workflows involving deploying Python code, but honestly, I think we should be focusing on the one that is most common, at least in the open source community - building pure Python code for distribution on PyPI. It's likely far from being the most common workflow out there, but there's no way we can capture all of the various in-house workflows that must exist. And there's generally no examples of such workflows in the public domain that people can learn and copy from. You say

For instance, at my last job, we had a build pipeline that included building wheels only, no sdist.

Can you point to an example project on the web with that sort of workflow that I can look at? Without examples available, it's hard to see how knowing that suchthings are options is helpful to a beginner, IMO.

Having just re-read (skimmed, because that's what people really do :smile:) the overview document, I agree with @dstufft. The section on Python Source Distributions very clearly gives me the impression that I shouldn't need to worry about wheels until I have something more complex than a pure Python project. The next paragraph

If you rely on any non-Python code, or non-Python packages (such as libxml2 in the case of lxml, or BLAS libraries in the case of numpy), you will need to use the format detailed in the next section, which also has many advantages for pure-Python libraries.

strongly reinforces that. I don't use non-Python code, so I can ignore "the format in the next section" (whatever it is). And then I'll get confused when people ask me "why aren't you publishing wheels?" What's a wheel? The packaging guide explicitly said I don't need to worry about them!

Anyway, I'm just going to let it stand that with that section as it now is, I wouldn't recommend it to a new user. And in fact, I'd actively suggest that new users do not read it until they know very clearly what they are trying to do. I feel that's a bad position to be in, but as I'm not offering to rewrite it, I'll just make that statement and leave it to others to consider whether it's worth trying to reword things.

pfmoore commented 6 years ago

the main thing the document is trying to highlight is that you can depend only on stdlib to do sdists

I'm not sure how that's even relevant. We very definitely should not be recommending an approach that avoids 3rd party tools these days - distutils simply doesn't support modern standards (yeah, unless you distribute only pure Python sdists, but that's cheating because your users still need setuptools to reliably install them, and you should test the install mechanisms that your users will use).

I thought we'd long ago agreed that "pip to install, setuptools to build" was the minimum toolset we'd officially support? (With the proviso that when PEP 517 becomes a reality, "setuptools" becomes "a PEP 517 compliant build tool (of which, setuptools may still be our default recommendation, but maybe we'll promote flit for pure Python projects)".

mahmoud commented 6 years ago

@pfmoore In short, that's how Python ended up with a blind spot for how people actually need to package Python tools and applications. There are several options (e.g., freezing) that are far more common for proprietary software than open-source. I'm keeping an eye out for linkable open-source projects and hope you will, too!

Everything I've experienced suggests the original post got so popular because the official channels up to this point seemed to only focus on built-in and open-source use cases for Python. And while I am a huge free software fan and supporter, there are lot of "day job" applications of Python that are not served well by PyPI best practices alone.

The overview is not avoiding 3rd-party tools, it's a work-in-progress summary profile of all the viable options still on the table for practical Python programmers. Programmers who are often constrained by only being able to use what's installed, often the case with shared systems and platforms.

You'll notice eggs are nowhere to be found in the doc. Maybe sdists will go that way in due time. At the moment, they're still more useful than the Python equivalent of an SRPM, and in the context of a document, they're a useful bridging concept between simplistic .py distribution and wheels.

Again, not against rewording, especially reducing section size and increasing link density. I get the sense there's a real gap that needs filling, potentially as a Guide about when to create an sdist vs a wheel.

dstufft commented 6 years ago

Furthermore, I would contest your characterization of the average Python reader. My experience answering questions on the original essay and talks are that people prefer to go maximal.

I think it depends on the purpose for people reading. If I'm reading say a Wikipedia page because I'm interested in why there are 20 different TLS implementations or I'm really trying to figure out what differentiates those 20 different TLS implementations. Then I'm far more likely to sit down and read that page top to bottom. I would guess that the original essay and talks had people in that boat, they weren't so much as trying to accomplish a particular task, but interested in the overall knowledge provided within. Perhaps I'm wrong on that!

The people coming to packaging.python.org are generally, in my opinion, primarily looking to achieve a particular task. Sometimes that task will need them to disambiguate between all of the various tools in the ecosystem to figure out which one really suites their need... but a lot of the time they don't want that and they just want to get told what to do. Those users who are trying to achieve a task, and aren't looking for a deep dive are most likely to scan and look for the relevant information, rather than go maximal.

They are likely to default to wheel because it's the biggest, newest, and most definitive.

I would honestly rather people skew towards releasing wheels without sdists, than sdists without wheels-- if only because in the former case we can better provide guidance (since it's a pretty unusual case) and possibly add warnings and such in PyPI that maybe they should upload a sdist too. However, with the latter we can't do the same, because it's incredibly common for people to only release sdists, so the signal to noise ratio would just be ridiculous.

Back to the content, the main thing the document is trying to highlight is that you can depend only on stdlib to do sdists. You don't even need 3rd party packages. That's a big feature in my experience.

We shouldn't be highlighting that fact TBH, we should be strongly discouraging it. You can't build a modern Python package with the stdlib, only a broken [1] legacy package that is likely to be a footgun for you or your users. As @pfmoore said, the only reason it even sort of works reliably for people is that most of the tools in the ecosystem go to greath lengths to make sure that if you do try to build a package with distutils, it will actually get installed using setuptools.

Honestly, the Python documentation should probably have a big red bar at the top of every distutils page that says you should not be using distutils, and the documentation is only there as a reference.

[1] This isn't an exaggeration. The exact same setup kwargs can have different meanings on distutils and setuptools, and the divergence is not likely to be rectified. That's without getting into the extensions that setuptools (and related projects) add to setup.py that distutils "silently" ignores (it's not really silent, but it's lost in a sea of output).

mahmoud commented 6 years ago

@dstufft I think "modern" is kind of a loaded term and definitely a moving target. In any case, I look forward to the red bar on distutils because I think that's a necessary step in getting to all-wheels for pure-Python packages. In the meantime, distutils sdists are a good sight better than rolling your own, because at least it'll get a relatively standard path forward if technology constraints change.

I'll repeat that I'm fine saying "Nowadays, sdists are still technically sufficient for distributing pure-Python software, but deficient in a lot of other ways. Use wheels whenever possible."

I don't think it's too much to ask that readers be provided links to back up those claims and recommendations.

dstufft commented 6 years ago

I think we should be focusing on the one that is most common, at least in the open source community - building pure Python code for distribution on PyPI.

For the record, I'm perfectly fine not focusing specifically on uploading to PyPI, I still think the section creates a sense of a false dichotomy in whether you should use sdist or wheel to distribute your project. It presents them as different options (which they are) but the rationale for why they're different and why you'd use one over the other is just generally wrong IMO.

In the meantime, distutils sdists are a good sight better than rolling your own, because at least it'll get a relatively standard path forward if technology constraints change.

Why is the choice "distutils or rolling your own"? That seems to be completely ignoring the fact that setuptools exists?

mahmoud commented 6 years ago

@dstufft Yeah, they're not totally distinct, they're part of a gradient, with a lot of overlap between sdists and pure-Python wheels. I did my best to express that nuance without losing the distinctions, and in case you can't tell, am actively trying in this conversation to find language that would work better for that.

Re: distutils: It may seem a small/nonexistent issue if you're primarily open-source or otherwise cutting edge, but it's because setuptools isn't built in. When all you have is a bunch of Python files, Python, and *nix, what do you do? Preferably not roll your own! At least not when distutils can cut it.

dstufft commented 6 years ago

Re: distutils: It may seem a small/nonexistent issue if you're primarily open-source or otherwise cutting edge, but it's because setuptools isn't built in. When all you have is a bunch of Python files, Python, and *nix, what do you do? Preferably not roll your own! At least not when distutils can cut it.

So if the document just mentioned you could do that, but otherwise directed people towards setuptools, that'd be one thing. As far as I can tell the only mention of setuptools is talking about entrypoints later. If I'm trying to use this document to disambiguate what tools are available to me, the right answer doesn't even appear to be in this document, just the sub optimal "well if you REALLY can't pull in a setuptools dependency" answer.

Actually let me back up a minute here.

@mahmoud What goal do you have for the reader of the "Packaging Python libraries and tools"? What is this section of the overview trying to achieve for the reader? Maybe the gap between us is because we're viewing the section as having different intents?

And a sub question, are we trying to:

ncoghlan commented 6 years ago

Note that the long term intent is for distutils to be gone by default in a future stdlib release, with an opt-in installation of setuptools required to get it back: https://github.com/pypa/packaging-problems/issues/127

mahmoud commented 6 years ago

@dstufft I'd say the overview is mostly the first option, but kind of something else.

Foremost, the Overview, especially the central, technology-listing part, is meant to be a living document. So if distutils and sdists-for-library-distribution are on their way out, as you and @ncoghlan point out, let's convey that to readers. Tactfully, of course -- don't want to make it seem like Python's getting a new gap. [1]

The primary observation is that most intermediate Python developers get overwhelmed and frustrated with the dozens of packaging options and guides out there. So, the overview introduces an organizing principle around classes of format, distinguishing each format by a combination of what it can convey and what environmental constraints/dependencies it carries.

This really helps the reader see what an experienced engineer knows: each format has its applications and other reasons for existing. Once you understand that spectrum, the packaging space is a lot less busy and frustrating. Very few readers are tabula rasa when it comes to software distribution, and the overview's first job is to orient them.

So more directly to the subquestion, the goal is to educate the user on something they might justifiably do to distribute Python software, building on examples they've likely encountered recently. People have installed/deployed VM images, .exes, wheels, and, yes, sdists. If I straight up delete the sdist section right now, I know readers will say "hey what about those sdist things I see around, what are those about?" I estimate very few would ask such a question about .egg, so you won't find them mentioned in the overview. [1, again]

Lastly, I think any education around tooling should almost exclusively come in the form of links. The overview exists to introduce the classes of formats. The tooling available within each class is varied, evolving, and much more subjective. If something is in common use, we should include a good link to it. For instance, I'd love to link to a guide about dynamic vs static linking in the context of binary wheels, a hidden but very real distinction. I just haven't found such a suitable guide yet. Maybe we'll add it to the guide in the near future? :)

[1]: I think a separate page for deprecated/obsolete technologies would be great. The Overview doesn't mention eggs, but it would be good to capstone those legacy technologies for which there are still a great many Google results and StackOverflow answers. Just a brief explanation of what they were, why they existed, what applications they're obsolete for, and what's recommended instead.

pfmoore commented 6 years ago

the goal is to educate the user on something they might justifiably do to distribute Python software, building on examples they've likely encountered recently

I have to say, when I read the page, that's not how it read to me. I found it read more like a narrative leading readers through the questions they would ask if they wanted to package up their software, and on that basis, as soon as I get to "you have a pure Python package, and sdists are what you need" (paraphrasing, but that's the message I took away from the page) I stop reading and follow the link to "learn more about sdists".

I get that the idea here is to introduce the different technologies involved in distribution a library/tool, in much the same way as the "Packaging Python applications" summarises the options there. But it simply doesn't read that way to me. I think the graduated approach going from simpler to more complex types of library obscures the basic point which is that this is a summary of what options are available, not a graduated "pick the first option that suits your use case" tutorial.

With that in mind, I'd see a better summary as being a series of bullet points, describing the individual technologies, but with no explicit order of complexity (there's an implied order, but I'm keeping it low key deliberately).

etc. You get the idea. The point is that it's an overview from the perspective of technology -> explanation, not task -> approach (which is what the tutorials cover).

I'd put this in the form of a PR, but I'm uncomfortable simply ripping out all of the current content of the "Packaging Python libraries and tools" section, and I don't really know where I'd put it otherwise. Plus, it needs additional polishing (not least, including the addition of some links to fuller documentation) that I don't have time to do right now.

mahmoud commented 6 years ago

@pfmoore The beginning of the overview is a bit more narrative, as it introduces the concept of frontloading your deployment design phase.

And I think your list is pretty good! However, it doesn't focus on what's necessary on the target/deployment environment.

As you can see, it's about that end-to-end robustness and complexity. sdists are getting squeezed out because pip/setuptools/wheel are getting so widespread. Tangentially, I would argue that wheels are not just for PyPI; they're very useful on their own as part of PEX build pipelines, for instance.

Static linking vs dynamic linking also impacts this gradient, but is harder to work in, because it's all behind the concept of a wheel.

Tooling like distutils vs setuptools vs flit, I gotta think we can link to a doc about that. It's just too technical for this overview. And OS-specific packaging might be worthy of an aside, especially if we can find a good link to Debian's or some other guide to packaging Python libraries as packages. Since there's mention of OS packages later, I don't think it's desperately high priority.

I do think that's helpful though, and I'm happy we're iterating on the content. There are several things I'll pull in to the section once I get a chance.

Would be great to see a "deciding between distutils, setuptools, and flit" guide or FAQ entry. That's exactly the kind of thing I think the section should link to.

pfmoore commented 6 years ago

And I think your list is pretty good! However, it doesn't focus on what's necessary on the target/deployment environment.

I'd dispute or at least extend some of your points. I certainly intended to focus more on the target than on the developer's side.

  • bare .py can be installed with cp/scp, but needs a lot of environment predictability, and a little bit of blind faith

And has no uninstall or management capabilities, and a relatively high level of Python knowledge on the part of the end user. Remember, this is in a section about libraries, so you have to know where site-packages is, and copy your file there. Deploying an application as a pure .py file is a different matter.

  • sdists can be built and installed with just tar/gzip/Python, provided your code is just Python

Not true. Many sdists require setuptools (the sdist itself needs it, because setup.py does from setuptools import...). More generally, a sdist can import anything in setup.py, and the end user has no way of knowing what will be needed short of reading the source of the package (or relying on pip, which does that for you). Getting the right environment for a build (and installing from sdist needs a build!) can be tricky, even for pure Python packages. (Which is not to say that it can't be easy, just that often it isn't).

  • wheels need setuptools/wheel, but are generally worth it

No they don't, not to install. To install a wheel you only need to unzip it and maybe copy some files (see the wheel spec). Or just use pip ;-)

I really don't understand why you're so keen on promoting sdists. In my opinion, there are simply no advantages to distributing as a sdist rather than as a wheel (unless you're so lazy that a 1-line command to build a wheel as well as a sdist is too much for you :wink:) Distributing sdists as well as wheels is important, sure, but omitting wheels is a false economy.

As you can see, it's about that end-to-end robustness and complexity. sdists are getting squeezed out because pip/setuptools/wheel are getting so widespread.

No, sdists are getting "squeezed out" because they require a build environment on the target machine (that build environment may be pretty minimal, but it has to be there). And it's not even right to say they are being "squeezed out" - they simply serve a different purpose (cases where installing from source is important for its own sake) which is valid, but specialised.

Tangentially, I would argue that wheels are not just for PyPI; they're very useful on their own as part of PEX build pipelines, for instance.

Certainly, but that's not particularly relevant here, where we're talking about libraries.

Tooling like distutils vs setuptools vs flit, I gotta think we can link to a doc about that.

It's actually very simple, at the level we care about in this document. All are build tools, needed as part of the build process. (Yes, there's setup.py install, but I personally don't think we should even discuss that in this document. There are use cases for it, but they are all specialised or advanced, and more suited to a document in the Guides section than the overview - except maybe as a historical note, but once you open up the can of worms that's packaging history, you'll never be able to stop!)

And OS-specific packaging might be worthy of an aside

The reason I mentioned OS-specific packaging was because I imagine a key question for many people in the Linux world (disclaimer: I'm a Windows dev, so this is alien ground for me) is whether you want to deploy in a way that lets people use your package with the system installed Python. Generally, Python-specific distribution tools are risky in that context - if working nicely with the system Python is an important goal for you, OS packages are really the best option (--user installs of wheels are OK, but we get bug reports on pip from time to time with those, so they aren't trouble-free, and you need to know what you're doing).

Would be great to see a "deciding between distutils, setuptools, and flit" guide or FAQ entry.

Part of that is easy. Never use distutils.

Flit vs setuptools is complex, agreed, mainly because they have very different approaches/philosophies. A guide would be good.

mahmoud commented 6 years ago

Some brief points:

This is in a section about libraries, so you have to know where site-packages is, and copy your file there.

Many popular libraries can and often are simply included by directly copying them into a project. That's the original library distribution story, after all.

OS-specific [library] packaging was because I imagine a key question for many people in the Linux world

As someone who works almost exclusively in non-Windows, I can attest that this is no longer the case, with the exception of OS-specific package developers, in which case your ecosystem and community have their own set of practices, outside the scope of the overview.

... it's not even right to say [sdists] are being "squeezed out" - they simply serve a different purpose... ... there's setup.py install, but... ... a historical note...

I just want to echo my previous suggestion for a linkable article briefly documenting these deprecated practices. At this point, Python's packaging past is many times the size of its present, and it's getting to be a needle in a haystack of old documents, Stack Overflow answers, and Google results. A 2-day old overview is responsible for 0 new sdists. A page briefly summarizing the old approach, why they existed, what applications they're obsolete for, and what's recommended instead would be a huge boon to dispelling those old ways.

And now the bigger point:

I really don't understand why you're so keen on promoting sdists.

Not sure what to say to this. I don't have any products to peddle, nor packaging projects to plug. I am trying to keep the conversation constructive and focused on specific content changes that respect the constraints of an overview format: provide an approachable, summary of practical techniques in common use.

The core conceit of the article is that "Packaging Python" is much bigger than "Python's packaging". It's not representative of the amount of Python software being shipped to go into great depths about packaging solutions produced by official Python sources, which really only cover a very small set of packaging use cases.

I'll see if I can rewrite again, charting a new path through from "copying a .py" to binary wheel. Even if it's not representative of what happens in the field, if it's more practical and better documented than the current state, sdists may well be reduced to an aside saying, "hey btw publishing an sdist is a best practice for people who want/need to build from source."

If you want to help, please provide me with useful links to materials that will help keep the new sections short.

mahmoud commented 6 years ago

Also as a side note, just to get some direct confirmation: Because distutils will go away at some point, does that mean that Python will simply no longer include a facility to both build and install a distributable library artifact? Or is there an ensurepip dance that means it contains about half of one?

pfmoore commented 6 years ago

Many popular libraries can and often are simply included by directly copying them into a project.

Can you provide examples? I'm not aware of any such.

I can attest that this is no longer the case,

Cool. Sorry for the noise in that case. I assume therefore that packages are not being installed into the system Python (as manually doing so is known to cause issues with the system Python). Again, can you provide links to examples? My main concern here is that we regularly advise people not to install packages into their system Python using pip, and I don't want what we're advising to be in conflict with this document.

I just want to echo my previous suggestion for a linkable article briefly documenting these deprecated practices

I'm not aware of any. There's PEP 517, which describes the new interface between installers and build tools (not yet implemented, we're working on it right now in pip). That interface includes no way of installing a project direct from a source tree or sdist, and by implication therefore makes it clear that builds need to be done by the installer building a wheel (unless one is already available) and installing it. That's not what you're after, I know, but it's the nearest I know to a linkable document explaining the direction we're moving in.

As you say, the history is huge, and web searches are pretty much guaranteed to find out of date stuff. That's why I'm keen on this document being a clear statement of the reality of current practice and recommendations, so that it can be the linkable reference for other articles. While I appreciate the benefits of linking to good detail discussions, I think that as the PyPA official documentation, the packaging guide is entitled to consider itself the authority in cases where links aren't available, and make assertions about best practice on that basis.

really don't understand why you're so keen on promoting sdists.

Not sure what to say to this.

I'm sorry, my statement was stronger (and more confrontational :disappointed:) than I intended it to be.

What I was trying to get at was that you've on a number of occasions stated that "Packaging Python is much bigger than Python's packaging", and that the PyPA solutions "really only cover a very small set of packaging use cases". You've seemed to imply (to me, at least) that sdists rather than wheels fit some of those use cases. If that's a misunderstanding on my part, I'm sorry.

One of the problems we (certainly I) have is that we do have a very limited view of what practices exist out there. But that's not because we're trying to avoid other use cases, it's because we don't know what they are! (OK, we might decide not to cover them once we do know what they are, but let's take it a step at a time).

I personally know of no use case[1] where there is a justification for distributing a sdist and not distributing a wheel. In the area of libraries (as opposed to applications), I know of no use case for distributing a library in any form other than sdist/wheel. Note that I'm not talking about development workflows here, but distribution. Development's a whole other area, and one that I don't know much about (except for my limited personal experience) and don't intend to get into here. The one thing I will say is that if we do want to cover it, it should be a separate discussion than the deployment side of things (and may be better served by a document in the Guides section).

If you do have examples of write-ups of such use cases, which explain to the reader what's wrong/inappropriate with the sdist/wheel formats, I'd be very interested. But in the absence of details, I have to focus on what I know of as use cases[2]. And I acknowledge that's an issue - but no matter how many times this has come up in the past (quite a lot on the pip tracker, from my recollection), we've never really got clear descriptions of the sort of "other workflows" involved. My assumption has been that they are organisation-internal, closed-source, or otherwise not publicly available, so not accessible to the open source community.

I'll see if I can rewrite again, charting a new path through from "copying a .py" to binary wheel.

I'm still not sure why you see that path as the right way of organising "An Overview of Packaging for Python". An overview should be (in my opinion) a structured list of what's available/involved, not a process - the problem with organising it as a process is precisely the issue that @dstufft and I initially noted, that people tend to stop at the point where they hit something that "looks like what I'm doing" - and that's frequently at the point where you're talking about sdists.

Maybe if you want to stick to the "charting a path" approach, you could instead take it from the target's point of view, and chart the path from installing a bundled-up binary distribution (a wheel) through the stages of needing more control over the build (sdist) to complete control over the layout of the installed package (copying individual files by hand)? Or maybe the fact that there's two opposite paths means that it would be better just to list the approaches without structuring them as a journey from "easiest" to "hardest" (or some other ranking, I'm not sure easy/hard is the distinction you're trying to make here).

If you want to help, please provide me with useful links to materials that will help keep the new sections short.

As noted, I don't really have links. Apologies for that, and I understand that it means that the value judgements in this discussion end up being very subjective. But I would like to turn the tables a little, too, and ask you to provide some links to the various other practices that you're hoping to cover here. A list of references to the various ways that projects/organisations try to use and package Python in the "real world" would be an immensely valuable resource, even if ultimately it doesn't fit in this document.

Because distutils will go away at some point

I never said that distutils would go away. Backward compatibility may mean that ends up being an impossible goal (although I believe @ncoghlan may have ideas on how it could be possible). But just because it has to remain in the stdlib for backward compatibility reasons, doesn't mean that packaging documentation written now, has to discuss it - just like the egg format, it's still around, some people still use it, but we don't support it any longer and we'd like them to stop please :wink:

One final note: The discussion here has moved quite a long way from the original comment. I hope it's still useful, but if it's too far off topic, I'm OK with dropping it. Really the only point I do want to focus on is ensuring that the document cannot be read as suggesting that "for a pure Python package, you can publish just a sdist, you don't need to offer a wheel" is PyPA recommended advice. It's not - we recommend publishing both, unless you know what you're doing and why, and have made a conscious decision not to follow that recommendation. And anecdotally, the current wording does give the wrong impression - at least to @dstufft and me. (A valid criticism here is that we're both too close to the subject to appreciate how the target audience would read it - maybe we can get someone not so close to the packaging community to read it and comment? I have a colleague I'll ask, but all of his Python coding is for his own use, so he may reply "none of this makes any sense to me" - I'll report back though).


[1] Other than non-technical policies like "My organisation/project will not allow any form of "built artifact" to be installed on our systems, everything has to be build from source code". [2] I am aware of the internal processes in my own workplace - which is not a heavy Python user - and there, I can confirm that wheels/sdists are a reasonable fit.

theacodes commented 5 years ago

Hey folks, this issue has been open for a long time and has a lot of discussion. Have we determined anything actionable for this project?

mahmoud commented 5 years ago

Yes, I have on my task list to make some revisions, including some updated graphics. :)I got a bit caught up on making a survey of open-source Python applications (in large part to collect packaging practices), so the inevitable return is taking a bit longer is all. :)

theacodes commented 5 years ago

No worries, thanks for the update!

On Mon, Jan 21, 2019, 9:13 PM Mahmoud Hashemi notifications@github.com wrote:

Yes, I have on my task list to make some revisions, including some updated graphics. :)I got a bit caught up on making a survey of open-source Python applications https://github.com/mahmoud/awesome-python-applications (in large part to collect packaging practices), so the inevitable return is taking a bit longer is all. :)

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pypa/python-packaging-user-guide/issues/539#issuecomment-456273478, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPUcwkP2ab_RS8KcqvgkVwcTRm7jWXUks5vFp3igaJpZM4V6piC .

dimaqq commented 5 years ago

My 2c as a user of pypi (99.9% read, <.1% write):

The overview has too much text:

For example, the following:

As a general-purpose programming language, Python is designed to be used in many ways. You can build web sites or industrial robots or a game for your friends to play, and much more, all using the same core technology.

Python’s flexibility is why the first step in every Python project must be to think about the project’s audience and the corresponding environment where the project will run. It might seem strange to think about packaging before writing code, but this process does wonders for avoiding future headaches.

This overview provides a general-purpose decision tree for reasoning about Python’s plethora of packaging options. Read on to choose the best technology for your next project.

Should, IMO, be replaced with:

Overview

(yes, a single word.)

Thus, on the subject of sdist, I wish that packaging.python.org advised/advertised wheels first. Other distributables could be in the "when wheels are not enough" section, right after the comparison of any-arch vs. arch-specific wheels, which appears to be missing.

mahmoud commented 5 years ago

@dimaqq Thanks for the feedback! I'll look at tightening up the language on the next round of edits.

Based on your feedback, I'll have to address the potential miscommunication that this document is meant as an introduction to PyPI, which it is not. Most Python software is not on, nor appropriate for, distribution with PyPI.

This overview is meant to provide an integrated perspective for developers thinking beyond a developer audience. I'll consider adding a .. note: to this effect. Thanks again!