repology / repology-updater

Repology backend service to update repository and package data
https://repology.org
GNU General Public License v3.0
502 stars 177 forks source link

WinGet Integration #1392

Open denelon opened 7 months ago

denelon commented 7 months ago

One of our community members has started digging into the Repology tooling. Before we add a tool to help us with mappings between WinGet package identifiers and repology "projects" and "packages" I wanted to reach out to see if there is a better way we should reason about this kind of mapping.

I've read a few of the older Issues with "WinGet" mentioned here, and wanted to get a better understanding of how we could help improve each other's projects. I'm a huge fan of the work to pull this project together. We recently added the Repology badge to the main README.md over at winget-pkgs.

There have been several interesting discussions about how we might handle WinGet manifests for versions of packages with potential vulnerabilities.

I'd also like to better understand the logical distinction made here between projects and packages. I saw a comment about avoiding "windows-only" projects. I'm not trying to push any kind of an agenda. I'm just seeking to understand.

Trenly commented 7 months ago

I'd also like to understand if there is a way to better handle how package information is pulled from WinGet, as several winget packages are listed as outdated, when the philosophy over at the winget-pkgs repo is to keep all the versions possible as legacy versions, since with Windows packages there is a larger chance that an organization may need a specific version of any given package,

AMDmi3 commented 7 months ago

One of our community members has started digging into the Repology tooling. Before we add a tool to help us with mappings between WinGet package identifiers and repology "projects" and "packages" I wanted to reach out to see if there is a better way we should reason about this kind of mapping.

There is https://repology.org/tools/project-by which does exactly that - maps package names to repology projects.

https://github.com/microsoft/winget-pkgs/pull/146673

You could've asked to add vulnerable flag to the API. Webpage scraping is not tolerated, also the client must set distinctive user-agent and maintain rate limit or 1RPS (see API TOS). If it would do a lot of requests it may be better to set up a bulk export instead.

I'd also like to better understand the logical distinction made here between projects and packages.

Project roughly corresponds to upstream project (Firefox). Package is a single package of it in some repository (Mozilla Firefox 124.0.1 in winget).

I saw a https://github.com/repology/repology-updater/issues/658#issuecomment-405199141 about avoiding "windows-only" projects.

Repology is targeted at F/OSS and cross-platform software ecosystem. Projects present in windows and macos repositories only are hidden from the search, though otherwise tracked.

AMDmi3 commented 7 months ago

I'd also like to understand if there is a way to better handle how package information is pulled from WinGet, as several winget packages are listed as outdated, when the philosophy over at the winget-pkgs repo is to keep all the versions possible as legacy versions

There should be no problem with legacy versions - as long as a newer version is available, older versions would be marked as legacy instead of outdated. There are some exceptions to this logic though, it would make sense to look at specific examples.

Trenly commented 7 months ago

microsoft/winget-pkgs#146673

You could've asked to add vulnerable flag to the API. Webpage scraping is not tolerated, also the client must set distinctive user-agent and maintain rate limit or 1RPS (see API TOS). If it would do a lot of requests it may be better to set up a bulk export instead.

I've closed the PR; Where should I make the request to add to the API?

AMDmi3 commented 7 months ago

I've closed the PR; Where should I make the request to add to the API?

It's still not clear to me whether you need an API change adding vulnerable flag (but it makes sense to add it regardless) or a bulk export. How many requests would a tool generate and in which periods?

Trenly commented 7 months ago

I've closed the PR; Where should I make the request to add to the API?

It's still not clear to me whether you need an API change adding vulnerable flag (but it makes sense to add it regardless) or a bulk export. How many requests would a tool generate and in which periods?

My thoughts are that the tool would request the specific projects that are listed as vulnerable at winget, along with the specific versions that are vulnerable.

Looking at the documentation, this could probably be a single call to the Filtered Packages endpoint with the query string ?inrepo=winget&vulnerable=1, so long as the Package dictionary also had an indication of whether or not each was vulnerable (perhaps a list of the CVEs?)

One to two calls daily or potentially even less frequently would likely be enough for some basic tooling to be built on Winget's side

AMDmi3 commented 7 months ago

Well yes, this looks like it could be done with a single API request. It's rather heavy as it returns a lot of packages, but I guess it's ok if it's not too frequent. I've added vulnerable flag to API projects output. You can use srcname to map repology data back winget packages. What's left is that not all packages are returned by the API because of being windows only, I might reconsider that.

niStee commented 5 months ago

Just noticed, that Repology no longer tracks repository winget: https://repology.org/repository/winget.

Last time Repology has processed this repository was 2024-05-31 18:20 (a day ago).

AMDmi3 commented 5 months ago

Until there's proper QA which would prevent invalid yaml.

denelon commented 3 months ago

@AMDmi3 I believe we've addressed the offending date format we were made aware of in our latest release. Are there any other areas of concern? I'm happy to make sure we have tests so we're not injecting any bad data.

AMDmi3 commented 3 months ago

The parser currently fails here due to lack of expected DocumentUrl. Is it or should it be treated as valid?

You could also add a check for floating point (unquoted) versions. This is not a stopper for reenabling winget in Repology, but floating point versions may be misinterpreted, i.e. 1.10 would be parsed as 1.1 which it isn't:

2024-08-07 00:08:33   manifests/p/PolarGoose/Handle2/1.0: WARNING: PackageVersion "1.0" is a floating point, should be quoted in YAML
2024-08-07 00:08:33   manifests/p/PolarGoose/BluetoothDevicePairing/11.0: WARNING: PackageVersion "11.0" is a floating point, should be quoted in YAML
2024-08-07 00:08:33   manifests/p/PolarGoose/ShowWhatProcessLocksFile/6.0: WARNING: PackageVersion "6.0" is a floating point, should be quoted in YAML
2024-08-07 00:08:45   manifests/i/IrfanSkiljan/IrfanView/PlugIns/4.67: WARNING: PackageVersion "4.67" is a floating point, should be quoted in YAML
2024-08-07 00:08:49   manifests/c/Cuyler/ACNESCreator/1.5: WARNING: PackageVersion "1.5" is a floating point, should be quoted in YAML
2024-08-07 00:08:51   manifests/c/calendulish/SteamToolsNG/3.2: WARNING: PackageVersion "3.2" is a floating point, should be quoted in YAML
2024-08-07 00:09:00   manifests/g/GeorgieLabs/SoundWireServer/2.5: WARNING: PackageVersion "2.5" is a floating point, should be quoted in YAML
2024-08-07 00:09:43   manifests/s/SURF/LetsConnectClient/4.0: WARNING: PackageVersion "4.0" is a floating point, should be quoted in YAML
2024-08-07 00:09:44   manifests/s/SURF/eduVPNClient/4.0: WARNING: PackageVersion "4.0" is a floating point, should be quoted in YAML
2024-08-07 00:09:47   manifests/s/squidowl/halloy/2024.9: WARNING: PackageVersion "2024.9" is a floating point, should be quoted in YAML
2024-08-07 00:09:47   manifests/s/squidowl/halloy/2024.10: WARNING: PackageVersion "2024.1" is a floating point, should be quoted in YAML
2024-08-07 00:09:58   manifests/t/Tyrrrz/DiscordChatExporter/GUI/2.43: WARNING: PackageVersion "2.43" is a floating point, should be quoted in YAML
2024-08-07 00:09:58   manifests/t/Tyrrrz/DiscordChatExporter/CLI/2.43: WARNING: PackageVersion "2.43" is a floating point, should be quoted in YAML
2024-08-07 00:10:11   manifests/t/TorProject/TorBrowser/13.5: WARNING: PackageVersion "13.5" is a floating point, should be quoted in YAML
2024-08-07 00:10:28   manifests/j/Japplis/Decoration/5.4: WARNING: PackageVersion "5.4" is a floating point, should be quoted in YAML
2024-08-07 00:10:28   manifests/j/Japplis/DirectoryCrop/1.2: WARNING: PackageVersion "1.2" is a floating point, should be quoted in YAML
2024-08-07 00:10:28   manifests/j/Japplis/Treecell/1.3: WARNING: PackageVersion "1.3" is a floating point, should be quoted in YAML
2024-08-07 00:10:28   manifests/j/Japplis/AppletRunnerPro/2.12: WARNING: PackageVersion "2.12" is a floating point, should be quoted in YAML
2024-08-07 00:10:28   manifests/j/Japplis/ScreenshotCrop/1.2: WARNING: PackageVersion "1.2" is a floating point, should be quoted in YAML
2024-08-07 00:10:28   manifests/j/Japplis/VRPhotoConverter/2.2: WARNING: PackageVersion "2.2" is a floating point, should be quoted in YAML
2024-08-07 00:10:28   manifests/j/Japplis/PosterFont/1.2: WARNING: PackageVersion "1.2" is a floating point, should be quoted in YAML
2024-08-07 00:10:28   manifests/j/Japplis/AntCommander/Personal/4.13: WARNING: PackageVersion "4.13" is a floating point, should be quoted in YAML
2024-08-07 00:10:28   manifests/j/Japplis/AntCommander/Pro/4.13: WARNING: PackageVersion "4.13" is a floating point, should be quoted in YAML
2024-08-07 00:10:28   manifests/j/Japplis/Watch/Pro/1.9: WARNING: PackageVersion "1.9" is a floating point, should be quoted in YAML
2024-08-07 00:10:28   manifests/j/Japplis/Watch/1.9: WARNING: PackageVersion "1.9" is a floating point, should be quoted in YAML
2024-08-07 00:10:28   manifests/j/Japplis/SheetViewer/1.2: WARNING: PackageVersion "1.2" is a floating point, should be quoted in YAML
2024-08-07 00:11:34   manifests/a/AdrianThurston/Ragel/6.10: WARNING: PackageVersion "6.1" is a floating point, should be quoted in YAML
SpecterShell commented 1 month ago

The parser currently fails here due to lack of expected DocumentUrl. Is it or should it be treated as valid?

This has been fixed.

dreua commented 1 week ago

I think this issue can/should be closed, thanks everyone!

SpecterShell commented 1 week ago

BTW rather than parsing the YAML files, you can also work on the index generated from them: https://cdn.winget.microsoft.com/cache/source2.msix