pypa / setuptools

Official project repository for the Setuptools build system
https://pypi.org/project/setuptools/
MIT License
2.38k stars 1.16k forks source link

PyPI page (and simple index) for setuptools is cluttered with useless data #265

Closed ghost closed 8 years ago

ghost commented 9 years ago

Originally reported by: pmoore (Bitbucket: pmoore, GitHub: pmoore)


Please can setuptools remove some of the ancient history from the long_description (and PyPI page)? Possibly even replace the huge changelog with a link to the changelog file, rather than having it all inline? That would make the PyPI page more readable and usable, and as a fortuitous side-effect, remove all the useless links from the pypi simple page which currently impose a serious performance impact on "pip install setuptools".


ghost commented 9 years ago

Original comment by JoostMolenaar (Bitbucket: JoostMolenaar, GitHub: JoostMolenaar):


I'd like to report that 'pip list --outdated' now is really fast on a Raspberry Pi, thanks a lot!

ghost commented 9 years ago

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


I've also now updated the Setuptools metadata so it no longer publishes the changelog, but only links to the changelog as published in the documentation. This required adding a sphinx plugin to do the links, but that turned out to be pretty straightforward (5d46fb01e3a8).

ghost commented 9 years ago

Original comment by pmoore (Bitbucket: pmoore, GitHub: pmoore):


I've just checked the simple index page, and it only contains actual file links now, which is perfect. Thanks for doing this, and thanks @wichert for pointing out this fix!

ghost commented 9 years ago

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


I've updated the setting in PyPI and removed the Additional URLs from previous scrapes of the long description. Please test and report any issues.

ghost commented 9 years ago

Original comment by pmoore (Bitbucket: pmoore, GitHub: pmoore):


OK, I didn't get that. If that works, I'm +1 on it. Can this be done? AIUI, that would need a project owner to make the change.

I personally don't like the over-long PyPI page, but not enough to want to put a lot of effort into changing it, particularly as the extensive information is clearly a deliberate project choice.

@jaraco I'm assigning the issue back to you, as I can't make the hosting mode change. I hope that's OK.

ghost commented 9 years ago

Original comment by wichert (Bitbucket: wichert, GitHub: wichert):


Just to be clear, but I think Jason already got this: my suggestion is to change a single flag (the hosting mode) for the setuptools distribution through the PyPI web interface. If you make that change no changes in the release procedure are necessary, and it will not be needed to remove any links or changelog entries from the long description.

ghost commented 9 years ago

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


On the other hand, I can't personally make changes that affect the release process itself, as that is something Jason R. Coombs manages. So suggestions like reporting the high points would have to be something he agrees to and implements.

The release process is fully mechanized through code in the repo and documented. As such, a Pull Request can actually alter the release process. Furthermore, I'm looking to have backup release managers, as I need to limit the work I spend on this project.

Wichert Akkerman, I assume your suggestion is to change the setuptools hosting mode to pypi-explicit (as per PEP 438)? That again would be a project management change, so it would have to be something Jason R. Coombs does.

Actually, there are other project owners on PyPI and I'm willing to add other members of the PyPA as well. Whatever is best for the project and the community is acceptable.

It seems to me that there are two main options that I could create PRs for: (1) Omit the change log from long_description.

There's more than simply publishing the changelog in the long_description. The references in the changelog are hyperlinked so that a user can readily see what things have changed and quickly navigate to the detail for that activity. As a result, Setuptools has one of the most visible release processes in the index. A change that removes this functionality would be a regression and an undoing of substantial work that went into making this information available.

(2) Move change log entries older than the last major release to CHANGES_OLD.txt.

I'm fine with this idea, but not as stated. Setuptools follows semver, which means that a major release is not indicative of the magnitude of release and thus does not serve as a good indicator of how much has changed or how old the changes are. Indeed, there's nothing about semver that allows the version number to reflect those types of attributes, so a semver-managed project cannot rely on the version numbers to delineate a history horizon.

I won't be accepting a personal burden to manually truncate the log. I'll accept pull requests to mechanically truncate the log during releases (or incidentally). I'll accept pull requests to update the release documentation on when or how the log should be truncated/archived. I'll also accept pull requests to manually truncate/archive the log according to accepted guidance.

I'm not picky on what the process should be, but it should be advertized in the project source or documentation so that it's obvious for any maintainer how it should be handled.

ghost commented 9 years ago

Original comment by cjerdonek (Bitbucket: cjerdonek, GitHub: cjerdonek):


Personally, I find having the changelog in the PyPI page makes it very annoying to read, but maybe that's just me.

+1. My preference would be for the changes only in the most recent release or since the last major release to be listed (and link to a list of changes older than that). If that's too hard, then including a link to the changelog is also fine. Maybe the changelog can be hosted here if it's problematic to link to the source file? https://pythonhosted.org/setuptools/index.html

ghost commented 9 years ago

Original comment by pmoore (Bitbucket: pmoore, GitHub: pmoore):


The simple index issue is in the progress of being addressed by PyPI changes (PEP 470). So that is a short-term problem. Personally, I find having the changelog in the PyPI page makes it very annoying to read, but maybe that's just me. If others are happy with leaving the PyPI page as it is, I'm not going to insist.

On the other hand, I can't personally make changes that affect the release process itself, as that is something @jaraco manages. So suggestions like reporting the high points would have to be something he agrees to and implements.

@wichert, I assume your suggestion is to change the setuptools hosting mode to pypi-explicit (as per PEP 438)? That again would be a project management change, so it would have to be something @jaraco does.

It seems to me that there are two main options that I could create PRs for:

  1. Omit the change log entirely from the PyPI page.
  2. Move changelog entries older than the last major release (6.0) to CHANGES_OLD.txt. @jaraco would need to do this in future whenever a new major release occurred (or at whatever frequency suited him).

Note that as well as affecting the PyPI page, the length of the long_description also affects the size of the PyPI metadata, which in turn affects the performance of the XMLRPC and JSON APIs.

ghost commented 9 years ago

Original comment by fdrake (Bitbucket: fdrake, GitHub: fdrake):


Excellent point, Wichert! Also avoids having to touch anything else.

+1

ghost commented 9 years ago

Original comment by wichert (Bitbucket: wichert, GitHub: wichert):


I suspect the issue at hand here is not really the content of the setuptools PyPI page itself (although including the entire release history is a bit much), but rather that all the links on that page are added to the simple index page, which makes that a very large (1.3Mb) download.

Since setuptools is hosted on pypi itself isn't the simplest solution here to change the URL hosting more for setuptools to "Do not extract URLs from the long description field - only use URLs explicitly specified below and files uploaded to PyPI (this is preferred))"?

ghost commented 9 years ago

Original comment by fdrake (Bitbucket: fdrake, GitHub: fdrake):


Many projects include the changelog in the long_description, too; there are many common patterns, so it's not clear there's one fit for all.

For long-lived projects, keeping it all in long_description is clearly problematic. Moving a complete changelog to a file in the release and providing the high points for recent releases in the long_description may be a good tradeoff. I know I appreciate being able to find information on the most important changes to a package quickly (by browsing PyPI); I would be surprised if I were the only one.

ghost commented 9 years ago

Original comment by Piotr_Dobrogost (Bitbucket: Piotr_Dobrogost, GitHub: Unknown):


Link to the original post on distutils-sig mailing list where this was raised – https://mail.python.org/pipermail/distutils-sig/2014-October/025021.html

ghost commented 9 years ago

Original comment by pmoore (Bitbucket: pmoore, GitHub: pmoore):


Actually, why not simply stop adding the changelog to long_description at all? If people want to find it, it's in the sources, and that's what most other projects do.

That's an easy PR (unless you want the resulting unneeded code like _linkify removing as well, which I'm willing to do but less sure I'll get right :-)) so if you're happy with that, I'll submit it.

ghost commented 9 years ago

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


I've been thinking about doing something like this for some time. The question is - how to do it? Where to draw the line? How to implement a link to a changelog which is current for that release?

Paul, if you take a look at the before_upload hook in the release.py module, you'll see where _linkify is called to add links to the changelog. It would be possible to customize that behavior to trim the changelog to a reasonable length.

Alternatively, you suggest having a link to the changelog. That could be generated in setup.py. The problem becomes twofold - retaining hyperlinks to relevant issues and only displaying changes for the indicated version. The latter is less important than the former. Perhaps you might consider writing a heroku app that renders the changelog for a particular version. Or create a publisher routine that renders the changelog and uploads it to a known location. Or maybe there's a way to have the changelog published as part of the documentation.

Of course, we don't have to implement this in a way that's sustainable. We can instead just truncate the changelog, such that older entries are suppressed unless you do some archeology in the repo, and then revisit this issue periodically.

I'm open to suggestions and glad to help review changes.