librariesio / libraries.io

:books: The Open Source Discovery Service
https://libraries.io
GNU Affero General Public License v3.0
1.1k stars 206 forks source link

Clean up Project Provider Map system #3317

Closed johnbintz-tidelift closed 4 months ago

johnbintz-tidelift commented 4 months ago

The Provider Maps are used to match Version#repository_sources to PackageManager classes for packages in ecosystems where multiple package providers exist. The original system was very lax in selecting the most appropriate provider, especially when multiple were involved.

Theoretically, a package should only be found at one provider, but in the case of Maven packages, this turned out to not be the case -- providers like Hortonworks and Atlassian provided their own versions of packages that also existed in Maven Central. We stopped examining Maven repos others than Google Maven and Maven Central, but unfortunately, some packages still present as belonging to these other less-supported repos.

The true fix for this is to remove any references to the old Maven repo in repository_sources, but that still left in place a Provider Map system that would break again in the case where a provider was added, then removed. This would cause events like a package being marked as removed happening because an old provider was checked and the URL 404'd.s This new code: