librariesio / libraries.io

:books: The Open Source Discovery Service
https://libraries.io
GNU Affero General Public License v3.0
1.11k stars 203 forks source link

Update Libraries Pypi parser to handle different "homepage" variations #3174

Closed djpowers closed 1 year ago

djpowers commented 1 year ago

Libraries will display a "Homepage" link for each package where we have such data available. Libraries example: https://libraries.io/pypi/PyICU Pypi source: https://pypi.org/project/PyICU/ (note the "Homepage" link under "Project links")

In the JSON API response we look for home_page or project_urls => Homepage to populate this field in Libraries.

However, Pypi is not consistent with this field, and different projects will refer to this field in different ways. I've also observed:

Examples from Pypi where we are missing homepage data on Libraries:

This changes ensures we parse out the few additional fields that are also used on PyPI, so checking these ones in addition should allow us to fill in more homepage data.