Open hugovk opened 4 years ago
SCM links in projects_urls
, preview:
Load data/top-repos.json...
Load data/top-pypi-packages.json...
Already done: 0
Find new repos...
Homepage https://github.com/benjaminp/six
Homepage https://github.com/boto/botocore
Homepage https://github.com/boto/s3transfer
Homepage https://github.com/kjd/idna
Homepage https://github.com/chardet/chardet
Homepage https://github.com/etingof/pyasn1
Homepage https://github.com/yaml/pyyaml
Homepage https://github.com/jmespath/jmespath.py
Homepage https://github.com/pypa/setuptools
Homepage https://github.com/agronholm/pythonfutures
Homepage https://github.com/tartley/colorama
Homepage https://github.com/boto/boto3
Homepage https://github.com/simplejson/simplejson
Source Code https://github.com/numpy/numpy
Homepage https://github.com/pypa/wheel
Download https://github.com/protocolbuffers/protobuf/releases
...
Homepage https://github.com/broadinstitute/keras-resnet
Homepage https://github.com/CyberZHG/keras-position-wise-feed-forward
Homepage https://github.com/makinacorpus/django-admin-watchdog
Old repos: 0
New repos: 3953
Not found: 1047
Counter({'Homepage': 3711,
None: 1047,
'Source': 95,
'Download': 63,
'Source Code': 38,
'Code': 14,
'Issue Tracker': 5,
'Repository': 5,
'GitHub: issues': 4,
'Github': 3,
'Bug Tracker': 3,
'Bug Reports': 2,
'Issue tracker': 2,
'Source code': 2,
'Twine source': 1,
'Issues': 1,
'Github repo': 1,
'Change log': 1,
'Changelog': 1,
'GitHub': 1})
Full list:
The project_urls
for each of the top 5,000, preview:
{'Homepage': 'https://urllib3.readthedocs.io/'}
{'Homepage': 'https://github.com/benjaminp/six'}
{'Homepage': 'https://github.com/boto/botocore'}
{'Homepage': 'http://python-requests.org'}
{'Homepage': 'https://dateutil.readthedocs.io'}
{'Homepage': 'https://pip.pypa.io/'}
{'Homepage': 'https://github.com/boto/s3transfer'}
{'Homepage': 'https://certifi.io/'}
{'Homepage': 'https://github.com/kjd/idna'}
{'Homepage': 'http://docutils.sourceforge.net/'}
{'Homepage': 'https://github.com/chardet/chardet'}
{'Homepage': 'https://github.com/etingof/pyasn1'}
{'Download': 'https://pypi.org/project/PyYAML/', 'Homepage': 'https://github.com/yaml/pyyaml'}
{'Homepage': 'https://stuvel.eu/rsa'}
{'Homepage': 'https://github.com/jmespath/jmespath.py'}
{'Documentation': 'https://setuptools.readthedocs.io/', 'Homepage': 'https://github.com/pypa/setuptools'}
{'Download': 'https://pypi.org/project/pytz/', 'Homepage': 'http://pythonhosted.org/pytz'}
{'Homepage': 'https://github.com/agronholm/pythonfutures'}
{'Homepage': 'https://github.com/tartley/colorama'}
{'Homepage': 'http://aws.amazon.com/cli/'}
...
Full list:
Multipart zip of /Users/hugo/Library/Caches/source-finder/
containing the top 5,000 (plus 5) JSON metatdata, created with zip source-finder.zip --out cachefiles.zip -s 10m
Rename the .z0X.zip
to .zOX
before uncompressing.
And a count of all the project_urls
keys:
Counter({'Homepage': 4881,
'Download': 1164,
'Documentation': 238,
'Issue tracker': 116,
'Source': 95,
'Tracker': 41,
'Source Code': 38,
'Bug Tracker': 36,
'Repository': 31,
'Changelog': 28,
'Bug Reports': 25,
'Funding': 18,
'Issues': 15,
'Issue Tracker': 14,
'Code': 14,
'CI: Travis': 9,
'GitHub: issues': 7,
'GitHub: repo': 7,
'Source code': 7,
'CI: AppVeyor': 5,
'Docs: RTD': 5,
'Docs': 4,
'CI: Circle': 4,
'Donation': 4,
'GitHub': 4,
'Chat: Gitter': 3,
'Coverage: codecov': 3,
'Tidelift': 3,
'Github': 3,
'Travis CI': 3,
'Say Thanks!': 3,
'CI: Shippable': 2,
'Website': 2,
'Code of Conduct': 2,
'Mailing lists': 2,
'Change log': 2,
'Release Management': 2,
'Webpage': 2,
'CI': 2,
'PyPI': 1,
'Test Coverage': 1,
'Tests': 1,
'Packaging tutorial': 1,
'Twine documentation': 1,
'Twine source': 1,
'CI: CircleCI': 1,
'Support': 1,
'Benchmarks': 1,
'Wiki': 1,
'Github repo': 1,
'Wikipedia': 1,
'Blog': 1,
'Donate': 1,
'Tidelift Subscription': 1,
'Dev Docs': 1,
'Discord': 1,
'Forum': 1,
'Code Coverage': 1,
'Continuous Integration': 1,
'Mailing List': 1,
'Chat': 1,
'Community': 1,
'Gitter': 1,
'bugs': 1,
'repository': 1,
'Issue Tracking': 1,
'Discord server': 1})
@hugovk, I think https://github.com/jayvdb/pypidb will be helpful. Note the repos are still getting set up, and there is currently a dependency on https://github.com/jayvdb/https-everywhere-py master, which I will fix by getting a new release out within a day or two.
Looks good! Thanks!
Updated list of most popular project_uls
keys in the top 4,000 downloaded packages (via https://github.com/hugovk/pypi-tools/pull/20#issue-493725680):
$ python3 project_urls.py -n 4000
Load data/top-pypi-packages.json...
Find project_urls...
100%|████████████████████████████████| 4000/4000 [00:07<00:00, 524.71project/s]
Counter({'Homepage': 3916,
'Download': 778,
'Documentation': 240,
'Source': 152,
'Changelog': 70,
'Repository': 63,
'Bug Tracker': 62,
'Source Code': 60,
'Tracker': 55,
'Issue tracker': 39,
'Issue Tracker': 30,
'GitHub': 28,
'Code': 26,
'Issues': 21,
'Funding': 20,
'Bug Reports': 17,
'Bug-Tracker': 8,
'Twitter': 8,
'CI: Travis': 7,
'Source-Code': 7,
'Docs': 6,
'GitHub: issues': 6,
'GitHub: repo': 6,
'Github': 6,
'Source code': 6,
'bugs': 6,
'repository': 6,
'Docs: RTD': 5,
'Donation': 5,
'CI: AppVeyor': 3,
'CI: Circle': 3,
'Chat: Gitter': 3,
'Code of Conduct': 3,
'Coverage: codecov': 3,
'Donate': 3,
'Mailing List': 3,
'Say Thanks!': 3,
'Tidelift': 3,
'Travis CI': 3,
'CI': 2,
'CI: GitHub': 2,
'CI: Shippable': 2,
'Change log': 2,
'Chat': 2,
'Download RPMs': 2,
'Forum': 2,
'Mailing lists': 2,
'Release Management': 2,
'Release notes': 2,
'Tidelift: funding': 2,
'Website': 2,
'Benchmarks': 1,
'Blog': 1,
'Bug tracker': 1,
'Bugs': 1,
'CI: Azure Pipelines': 1,
'CI: CircleCI': 1,
'CI: GitHub Workflows': 1,
'CI: Zuul': 1,
'Code Coverage': 1,
'Commercial License': 1,
'Community': 1,
'Conda-Forge': 1,
'Continuous Integration': 1,
'Coverage': 1,
'Dev Docs': 1,
'Discord': 1,
'Discussions': 1,
'Downloads': 1,
'Examples': 1,
'Feedstock': 1,
'Further Documentation': 1,
'Github repo': 1,
'Help/Questions': 1,
'History': 1,
'License': 1,
'Online Demo': 1,
'Packaging tutorial': 1,
'PyPI': 1,
'Read the Docs': 1,
'Release Notes': 1,
'Releases': 1,
'Support': 1,
'Test Coverage': 1,
'Tests': 1,
'Tutorials': 1,
'Twine documentation': 1,
'Twine source': 1,
"What's New": 1,
'Wiki': 1,
'Wikipedia': 1,
'conda': 1})
Number with project_urls: 3925/4000
Updated list of most popular project_uls keys in the top 5,000 downloaded packages:
python3 pypi_fields.py --number 5000 --format markdown
project_urls | Count |
---|---|
Homepage | 4845 |
Download | 738 |
Documentation | 711 |
Source | 400 |
Bug Tracker | 240 |
Source Code | 237 |
Repository | 233 |
Changelog | 159 |
Tracker | 150 |
Issue tracker | 131 |
Projects with project_urls: 4902/5000
And grouping some variants, we can see some popular choices:
project_urls | Count |
---|---|
Homepage | 4845 |
homepage | 10 |
Home | 5 |
Home Page | 2 |
Home-page | 2 |
Website | 2 |
Censys Homepage | 1 |
os_sys homepage | 1 |
startpage | 1 |
Webpage | 1 |
project_urls | Count |
---|---|
Download | 738 |
Download RPMs | 2 |
Downloads | 2 |
download | 1 |
project_urls | Count |
---|---|
Documentation | 711 |
Docs: RTD | 9 |
documentation | 9 |
Docs | 8 |
Docs: Contributing | 1 |
Docs: Dev | 1 |
Docs: Intro | 1 |
Docs: Technical Reference | 1 |
Docs: User Guide | 1 |
Documentation-latest | 1 |
Documentation-stable | 1 |
Further Documentation | 1 |
Read the Docs | 1 |
read the docs | 1 |
server documentation | 1 |
project_urls | Count |
---|---|
Source | 400 |
Source Code | 237 |
Repository | 233 |
GitHub | 56 |
Code | 29 |
Source code | 15 |
GitHub: repo | 12 |
Github | 11 |
Source-Code | 8 |
repository | 8 |
Sources | 2 |
Browse Source | 1 |
source | 1 |
.git | 1 |
github | 1 |
github wiki(under development) | 1 |
gitlab | 1 |
Git Clone URL | 1 |
GitHub repository | 1 |
Github repo | 1 |
RDKit on Github | 1 |
all files | 1 |
project_urls | Count |
---|---|
Bug Tracker | 240 |
Tracker | 150 |
Issue tracker | 131 |
Issue Tracker | 79 |
Issues | 55 |
Bug Reports | 42 |
User Support | 13 |
GitHub: issues | 12 |
Bug-Tracker | 9 |
Bug Reporting | 1 |
Bug_Tracker | 1 |
Bug tracker | 3 |
tracker | 1 |
Bugs | 1 |
bugs | 1 |
help | 1 |
Report Issues | 1 |
project_urls | Count |
---|---|
Changelog | 159 |
Changes | 67 |
Release Management | 10 |
Release notes | 9 |
Release Notes | 8 |
changelog | 5 |
Change Log | 5 |
Releases | 4 |
History | 3 |
Docs: Changelog | 2 |
Change log | 1 |
Released Versions | 1 |
What's New | 1 |
project_urls | Count |
---|---|
Chat | 62 |
Slack Chat | 43 |
Discussions | 12 |
Gitter | 6 |
Chat: Gitter | 8 |
Discord | 4 |
Forum | 3 |
Slack | 3 |
GitHub: discussions | 2 |
Community | 2 |
Telegram Channel | 2 |
Telegram Chat | 2 |
Discord Server | 1 |
Discord server | 1 |
Discussion forum | 1 |
just a chat to talk about python | 1 |
project_urls | Count |
---|---|
Funding | 59 |
Donate | 12 |
Tidelift | 8 |
Donation | 7 |
Ko-fi | 5 |
Tidelift: funding | 2 |
funding | 1 |
Sponsor | 1 |
project_urls | Count |
---|---|
CI | 24 |
CI: GitHub | 6 |
CI: GitHub Actions | 5 |
CI: Github Actions | 2 |
Continuous Integration | 2 |
CI: Travis | 3 |
CI/CD | 1 |
CI: AppVeyor | 1 |
CI: Azure Pipelines | 1 |
CI: Circle | 1 |
CI: CircleCI | 1 |
CI: GA | 1 |
CI: Shippable | 1 |
CircleCI | 1 |
Travis CI | 1 |
Summary: use
Source
In addition to
url
(aliashomepage
), packages on PyPI can have this metadata:The
url
homepage is added intoproject_urls
ashomepage
. For example, Pillow doesn't use define anyproject_urls
but does haveurl="http://python-pillow.org",
, and https://pypi.org/pypi/Pillow/json includes:Many projects have a link to their GitHub (or GitLab or Bitbucket etc.) repos as the homepage. For those that include an arbitrary link to a source repo, what is the most common one, when not the
homepage
?Checking the current top 5,000 packages, here is the
project_url
key where a source repo was found (defined as a URL containing one of github.com, gitlab.com, bitbucket.org or bitbucket.com):Some of these are specific things, like links to tarball downloads, or issue trackers. But the most common ones for a repo homepage are
Source
,Source Code
andCode
.Source
for adding new ones.