hugovk / pypi-tools

Command-line Python scripts to do things with PyPI
https://hugovk.github.io/pypi-tools
23 stars 2 forks source link

Investigation into "canonical" link for a PyPI repo link #11

Open hugovk opened 4 years ago

hugovk commented 4 years ago

Summary: use Source


In addition to url (alias homepage), packages on PyPI can have this metadata:

project_urls

An arbitrary map of URL names to hyperlinks, allowing more extensible documentation of where various resources can be found than the simple url and download_url options provide.

The url homepage is added into project_urls as homepage. For example, Pillow doesn't use define any project_urls but does have url="http://python-pillow.org",, and https://pypi.org/pypi/Pillow/json includes:

"home_page": "http://python-pillow.org",
"project_urls": {
    "Homepage": "http://python-pillow.org"
},

Many projects have a link to their GitHub (or GitLab or Bitbucket etc.) repos as the homepage. For those that include an arbitrary link to a source repo, what is the most common one, when not the homepage?

Checking the current top 5,000 packages, here is the project_url key where a source repo was found (defined as a URL containing one of github.com, gitlab.com, bitbucket.org or bitbucket.com):

Counter({'Homepage': 3711,
         None: 1047,
         'Source': 95,
         'Download': 63,
         'Source Code': 38,
         'Code': 14,
         'Issue Tracker': 5,
         'Repository': 5,
         'GitHub: issues': 4,
         'Github': 3,
         'Bug Tracker': 3,
         'Bug Reports': 2,
         'Issue tracker': 2,
         'Source code': 2,
         'Twine source': 1,
         'Issues': 1,
         'Github repo': 1,
         'Change log': 1,
         'Changelog': 1,
         'GitHub': 1})

Some of these are specific things, like links to tarball downloads, or issue trackers. But the most common ones for a repo homepage are Source, Source Code and Code.

hugovk commented 4 years ago

SCM links in projects_urls, preview:

Load data/top-repos.json...
Load data/top-pypi-packages.json...
Already done: 0
Find new repos...
Homepage    https://github.com/benjaminp/six
Homepage    https://github.com/boto/botocore
Homepage    https://github.com/boto/s3transfer
Homepage    https://github.com/kjd/idna
Homepage    https://github.com/chardet/chardet
Homepage    https://github.com/etingof/pyasn1
Homepage    https://github.com/yaml/pyyaml
Homepage    https://github.com/jmespath/jmespath.py
Homepage    https://github.com/pypa/setuptools
Homepage    https://github.com/agronholm/pythonfutures
Homepage    https://github.com/tartley/colorama
Homepage    https://github.com/boto/boto3
Homepage    https://github.com/simplejson/simplejson
Source Code https://github.com/numpy/numpy
Homepage    https://github.com/pypa/wheel
Download    https://github.com/protocolbuffers/protobuf/releases
...
Homepage    https://github.com/broadinstitute/keras-resnet
Homepage    https://github.com/CyberZHG/keras-position-wise-feed-forward
Homepage    https://github.com/makinacorpus/django-admin-watchdog
Old repos: 0
New repos: 3953
Not found: 1047
Counter({'Homepage': 3711,
         None: 1047,
         'Source': 95,
         'Download': 63,
         'Source Code': 38,
         'Code': 14,
         'Issue Tracker': 5,
         'Repository': 5,
         'GitHub: issues': 4,
         'Github': 3,
         'Bug Tracker': 3,
         'Bug Reports': 2,
         'Issue tracker': 2,
         'Source code': 2,
         'Twine source': 1,
         'Issues': 1,
         'Github repo': 1,
         'Change log': 1,
         'Changelog': 1,
         'GitHub': 1})

Full list:

hugovk commented 4 years ago

The project_urls for each of the top 5,000, preview:

{'Homepage': 'https://urllib3.readthedocs.io/'}
{'Homepage': 'https://github.com/benjaminp/six'}
{'Homepage': 'https://github.com/boto/botocore'}
{'Homepage': 'http://python-requests.org'}
{'Homepage': 'https://dateutil.readthedocs.io'}
{'Homepage': 'https://pip.pypa.io/'}
{'Homepage': 'https://github.com/boto/s3transfer'}
{'Homepage': 'https://certifi.io/'}
{'Homepage': 'https://github.com/kjd/idna'}
{'Homepage': 'http://docutils.sourceforge.net/'}
{'Homepage': 'https://github.com/chardet/chardet'}
{'Homepage': 'https://github.com/etingof/pyasn1'}
{'Download': 'https://pypi.org/project/PyYAML/', 'Homepage': 'https://github.com/yaml/pyyaml'}
{'Homepage': 'https://stuvel.eu/rsa'}
{'Homepage': 'https://github.com/jmespath/jmespath.py'}
{'Documentation': 'https://setuptools.readthedocs.io/', 'Homepage': 'https://github.com/pypa/setuptools'}
{'Download': 'https://pypi.org/project/pytz/', 'Homepage': 'http://pythonhosted.org/pytz'}
{'Homepage': 'https://github.com/agronholm/pythonfutures'}
{'Homepage': 'https://github.com/tartley/colorama'}
{'Homepage': 'http://aws.amazon.com/cli/'}
...

Full list:

hugovk commented 4 years ago

Multipart zip of /Users/hugo/Library/Caches/source-finder/ containing the top 5,000 (plus 5) JSON metatdata, created with zip source-finder.zip --out cachefiles.zip -s 10m

Rename the .z0X.zip to .zOX before uncompressing.

hugovk commented 4 years ago

And a count of all the project_urls keys:

Counter({'Homepage': 4881,
         'Download': 1164,
         'Documentation': 238,
         'Issue tracker': 116,
         'Source': 95,
         'Tracker': 41,
         'Source Code': 38,
         'Bug Tracker': 36,
         'Repository': 31,
         'Changelog': 28,
         'Bug Reports': 25,
         'Funding': 18,
         'Issues': 15,
         'Issue Tracker': 14,
         'Code': 14,
         'CI: Travis': 9,
         'GitHub: issues': 7,
         'GitHub: repo': 7,
         'Source code': 7,
         'CI: AppVeyor': 5,
         'Docs: RTD': 5,
         'Docs': 4,
         'CI: Circle': 4,
         'Donation': 4,
         'GitHub': 4,
         'Chat: Gitter': 3,
         'Coverage: codecov': 3,
         'Tidelift': 3,
         'Github': 3,
         'Travis CI': 3,
         'Say Thanks!': 3,
         'CI: Shippable': 2,
         'Website': 2,
         'Code of Conduct': 2,
         'Mailing lists': 2,
         'Change log': 2,
         'Release Management': 2,
         'Webpage': 2,
         'CI': 2,
         'PyPI': 1,
         'Test Coverage': 1,
         'Tests': 1,
         'Packaging tutorial': 1,
         'Twine documentation': 1,
         'Twine source': 1,
         'CI: CircleCI': 1,
         'Support': 1,
         'Benchmarks': 1,
         'Wiki': 1,
         'Github repo': 1,
         'Wikipedia': 1,
         'Blog': 1,
         'Donate': 1,
         'Tidelift Subscription': 1,
         'Dev Docs': 1,
         'Discord': 1,
         'Forum': 1,
         'Code Coverage': 1,
         'Continuous Integration': 1,
         'Mailing List': 1,
         'Chat': 1,
         'Community': 1,
         'Gitter': 1,
         'bugs': 1,
         'repository': 1,
         'Issue Tracking': 1,
         'Discord server': 1})
jayvdb commented 4 years ago

@hugovk, I think https://github.com/jayvdb/pypidb will be helpful. Note the repos are still getting set up, and there is currently a dependency on https://github.com/jayvdb/https-everywhere-py master, which I will fix by getting a new release out within a day or two.

hugovk commented 4 years ago

Looks good! Thanks!

hugovk commented 3 years ago

August 2020

Updated list of most popular project_uls keys in the top 4,000 downloaded packages (via https://github.com/hugovk/pypi-tools/pull/20#issue-493725680):

$ python3 project_urls.py -n 4000
Load data/top-pypi-packages.json...
Find project_urls...
100%|████████████████████████████████| 4000/4000 [00:07<00:00, 524.71project/s]
Counter({'Homepage': 3916,
         'Download': 778,
         'Documentation': 240,
         'Source': 152,
         'Changelog': 70,
         'Repository': 63,
         'Bug Tracker': 62,
         'Source Code': 60,
         'Tracker': 55,
         'Issue tracker': 39,
         'Issue Tracker': 30,
         'GitHub': 28,
         'Code': 26,
         'Issues': 21,
         'Funding': 20,
         'Bug Reports': 17,
         'Bug-Tracker': 8,
         'Twitter': 8,
         'CI: Travis': 7,
         'Source-Code': 7,
         'Docs': 6,
         'GitHub: issues': 6,
         'GitHub: repo': 6,
         'Github': 6,
         'Source code': 6,
         'bugs': 6,
         'repository': 6,
         'Docs: RTD': 5,
         'Donation': 5,
         'CI: AppVeyor': 3,
         'CI: Circle': 3,
         'Chat: Gitter': 3,
         'Code of Conduct': 3,
         'Coverage: codecov': 3,
         'Donate': 3,
         'Mailing List': 3,
         'Say Thanks!': 3,
         'Tidelift': 3,
         'Travis CI': 3,
         'CI': 2,
         'CI: GitHub': 2,
         'CI: Shippable': 2,
         'Change log': 2,
         'Chat': 2,
         'Download RPMs': 2,
         'Forum': 2,
         'Mailing lists': 2,
         'Release Management': 2,
         'Release notes': 2,
         'Tidelift: funding': 2,
         'Website': 2,
         'Benchmarks': 1,
         'Blog': 1,
         'Bug tracker': 1,
         'Bugs': 1,
         'CI: Azure Pipelines': 1,
         'CI: CircleCI': 1,
         'CI: GitHub Workflows': 1,
         'CI: Zuul': 1,
         'Code Coverage': 1,
         'Commercial License': 1,
         'Community': 1,
         'Conda-Forge': 1,
         'Continuous Integration': 1,
         'Coverage': 1,
         'Dev Docs': 1,
         'Discord': 1,
         'Discussions': 1,
         'Downloads': 1,
         'Examples': 1,
         'Feedstock': 1,
         'Further Documentation': 1,
         'Github repo': 1,
         'Help/Questions': 1,
         'History': 1,
         'License': 1,
         'Online Demo': 1,
         'Packaging tutorial': 1,
         'PyPI': 1,
         'Read the Docs': 1,
         'Release Notes': 1,
         'Releases': 1,
         'Support': 1,
         'Test Coverage': 1,
         'Tests': 1,
         'Tutorials': 1,
         'Twine documentation': 1,
         'Twine source': 1,
         "What's New": 1,
         'Wiki': 1,
         'Wikipedia': 1,
         'conda': 1})

Number with project_urls: 3925/4000
hugovk commented 1 year ago

June 2022

Updated list of most popular project_uls keys in the top 5,000 downloaded packages:

python3 pypi_fields.py --number 5000 --format markdown

Top 10

project_urls Count
Homepage 4845
Download 738
Documentation 711
Source 400
Bug Tracker 240
Source Code 237
Repository 233
Changelog 159
Tracker 150
Issue tracker 131

Full list

Details | project_urls | Count | |:---------------------------------|------:| | Homepage | 4845 | | Download | 738 | | Documentation | 711 | | Source | 400 | | Bug Tracker | 240 | | Source Code | 237 | | Repository | 233 | | Changelog | 159 | | Tracker | 150 | | Issue tracker | 131 | | Twitter | 83 | | Issue Tracker | 79 | | Changes | 67 | | Chat | 62 | | Funding | 59 | | GitHub | 56 | | Issues | 55 | | YouTube | 44 | | Slack Chat | 43 | | Bug Reports | 42 | | Code | 29 | | CI | 24 | | Source code | 15 | | User Support | 13 | | Discussions | 12 | | Donate | 12 | | GitHub: issues | 12 | | GitHub: repo | 12 | | Github | 11 | | Release Management | 10 | | homepage | 10 | | Bug-Tracker | 9 | | Docs: RTD | 9 | | Release notes | 9 | | documentation | 9 | | Chat: Gitter | 8 | | Code of Conduct | 8 | | Docs | 8 | | Release Notes | 8 | | Source-Code | 8 | | Tidelift | 8 | | repository | 8 | | Coverage: codecov | 7 | | Donation | 7 | | Say Thanks! | 7 | | CI: GitHub | 6 | | Coverage | 6 | | Gitter | 6 | | Mailing lists | 6 | | Blog | 5 | | CI: GitHub Actions | 5 | | Change Log | 5 | | Home | 5 | | Ko-fi | 5 | | Mailing List | 5 | | changelog | 5 | | Discord | 4 | | PyPI | 4 | | Releases | 4 | | Wiki | 4 | | Bug tracker | 3 | | CI: Travis | 3 | | Examples | 3 | | Forum | 3 | | History | 3 | | Slack | 3 | | Author | 2 | | CI: Github Actions | 2 | | Community | 2 | | Continuous Integration | 2 | | Docs: Changelog | 2 | | Download RPMs | 2 | | Downloads | 2 | | GitHub Project | 2 | | GitHub: discussions | 2 | | Home Page | 2 | | Home-page | 2 | | News | 2 | | Red Team Report | 2 | | Sources | 2 | | Telegram Channel | 2 | | Telegram Chat | 2 | | Test Coverage | 2 | | Tests | 2 | | Tidelift: funding | 2 | | Website | 2 | | .git | 1 | | Benchmarks | 1 | | Browse Source | 1 | | Bug Reporting | 1 | | Bug_Tracker | 1 | | Bugs | 1 | | CI/CD | 1 | | CI: AppVeyor | 1 | | CI: Azure Pipelines | 1 | | CI: Circle | 1 | | CI: CircleCI | 1 | | CI: GA | 1 | | CI: Shippable | 1 | | Censys Homepage | 1 | | Censys Search | 1 | | Change log | 1 | | CircleCI | 1 | | Citation | 1 | | Code Coverage | 1 | | Codecov | 1 | | Commercial License | 1 | | Company | 1 | | Conda-Forge | 1 | | Contact | 1 | | Container Image: DockerHub | 1 | | Contribute! | 1 | | Coverage: Codecov | 1 | | Discord Server | 1 | | Discord server | 1 | | Discussion forum | 1 | | Distribution | 1 | | Docs: Contributing | 1 | | Docs: Dev | 1 | | Docs: Intro | 1 | | Docs: Technical Reference | 1 | | Docs: User Guide | 1 | | Documentation-latest | 1 | | Documentation-stable | 1 | | End-User License Agreement | 1 | | Enterprise Support | 1 | | Example Report | 1 | | Feedstock | 1 | | Further Documentation | 1 | | Git Clone URL | 1 | | GitHub repository | 1 | | Github repo | 1 | | Help/Questions | 1 | | Installation | 1 | | License Texts | 1 | | Live demo | 1 | | Mailing list | 1 | | Maillist | 1 | | Matrix Profile Foundation | 1 | | Notebook Examples | 1 | | Online Demo | 1 | | Packaging tutorial | 1 | | Panel Examples | 1 | | Parent Project | 1 | | PyPi | 1 | | Q & A | 1 | | RDKit | 1 | | RDKit on Github | 1 | | Read the Docs | 1 | | Reference | 1 | | Released Versions | 1 | | Report Issues | 1 | | Reviews | 1 | | Samples | 1 | | SonarCloud | 1 | | Sponsor | 1 | | Style guide | 1 | | Support | 1 | | Travis CI | 1 | | Tutorials | 1 | | Webpage | 1 | | What's New | 1 | | Wikipedia | 1 | | Youtube | 1 | | all files | 1 | | blog | 1 | | bugs | 1 | | conda | 1 | | conda-forge | 1 | | download | 1 | | funding | 1 | | github | 1 | | github wiki(under development) | 1 | | gitlab | 1 | | help | 1 | | just a chat to talk about python | 1 | | made possible by | 1 | | os_sys homepage | 1 | | os_sys online | 1 | | read the docs | 1 | | server documentation | 1 | | source | 1 | | startpage | 1 | | tracker | 1 | | want to help | 1 |

Projects with project_urls: 4902/5000

Groups

And grouping some variants, we can see some popular choices:

Homepage

project_urls Count
Homepage 4845
homepage 10
Home 5
Home Page 2
Home-page 2
Website 2
Censys Homepage 1
os_sys homepage 1
startpage 1
Webpage 1

Download

project_urls Count
Download 738
Download RPMs 2
Downloads 2
download 1

Documentation

project_urls Count
Documentation 711
Docs: RTD 9
documentation 9
Docs 8
Docs: Contributing 1
Docs: Dev 1
Docs: Intro 1
Docs: Technical Reference 1
Docs: User Guide 1
Documentation-latest 1
Documentation-stable 1
Further Documentation 1
Read the Docs 1
read the docs 1
server documentation 1

Source

project_urls Count
Source 400
Source Code 237
Repository 233
GitHub 56
Code 29
Source code 15
GitHub: repo 12
Github 11
Source-Code 8
repository 8
Sources 2
Browse Source 1
source 1
.git 1
github 1
github wiki(under development) 1
gitlab 1
Git Clone URL 1
GitHub repository 1
Github repo 1
RDKit on Github 1
all files 1

Bug Tracker

project_urls Count
Bug Tracker 240
Tracker 150
Issue tracker 131
Issue Tracker 79
Issues 55
Bug Reports 42
User Support 13
GitHub: issues 12
Bug-Tracker 9
Bug Reporting 1
Bug_Tracker 1
Bug tracker 3
tracker 1
Bugs 1
bugs 1
help 1
Report Issues 1

Changelog

project_urls Count
Changelog 159
Changes 67
Release Management 10
Release notes 9
Release Notes 8
changelog 5
Change Log 5
Releases 4
History 3
Docs: Changelog 2
Change log 1
Released Versions 1
What's New 1

Chat

project_urls Count
Chat 62
Slack Chat 43
Discussions 12
Gitter 6
Chat: Gitter 8
Discord 4
Forum 3
Slack 3
GitHub: discussions 2
Community 2
Telegram Channel 2
Telegram Chat 2
Discord Server 1
Discord server 1
Discussion forum 1
just a chat to talk about python 1

Funding

project_urls Count
Funding 59
Donate 12
Tidelift 8
Donation 7
Ko-fi 5
Tidelift: funding 2
funding 1
Sponsor 1

CI

project_urls Count
CI 24
CI: GitHub 6
CI: GitHub Actions 5
CI: Github Actions 2
Continuous Integration 2
CI: Travis 3
CI/CD 1
CI: AppVeyor 1
CI: Azure Pipelines 1
CI: Circle 1
CI: CircleCI 1
CI: GA 1
CI: Shippable 1
CircleCI 1
Travis CI 1