librariesio / libraries.io

:books: The Open Source Discovery Service
https://libraries.io
GNU Affero General Public License v3.0
1.1k stars 206 forks source link

Allow the handling of non-ASCII nuget package names. #3346

Closed tiegz closed 3 months ago

tiegz commented 3 months ago

NuGet packages allow non-ASCII characters, and currently the NuGet ingestor is not escaping non-ASCII characters when fetching the packages, so we're seeing this error and not ingesting those packages:

URI must be ascii only "https://nuget.org/packages/...."

this PR escapes the name in the proper places (note that the NuGet source actually lists the escaped version as the canonical name in the HTML, e.g. here), including while unzipping the package's .nuspec file and checking each filename.