repology / repology-updater

Repology backend service to update repository and package data
https://repology.org
GNU General Public License v3.0
502 stars 177 forks source link

Nixos: Ignore packages explicitly marked as broken #1340

Open nomeata opened 1 year ago

nomeata commented 1 year ago

nixpkgs has huge coverage of the available software, and that’s partly because of auto-generated package files, e.g. for Haskell packages. But often these packaged cannot actually be built, so from a user point of view that’s almost as if they were not packaged. It’s a bit like a Debian package where there is a source tarbarll in the archive, but no binary packages could be built (and thus no binary packages would be reported to repology).

Luckily, they are usually marked as “broken”, and this information is available in the packages.json that is fetched by the repo updater, see the fourth line of this snippet:

    "adtool": {
      "meta": {
        "available": false,
        "broken": true,
        "description": "Active Directory administration utility for Unix",
        "homepage": "https://gp2x.org/adtool",
        "insecure": false,
        "license": {
          "deprecated": true,
          "free": true,
          "fullName": "GNU General Public License v2.0",
          "redistributable": true,
          "shortName": "gpl2",
          "spdxId": "GPL-2.0",
          "url": "https://spdx.org/licenses/GPL-2.0.html"
        },
        "maintainers": [
          {
            "email": "peter@hoeg.com",
            "github": "peterhoeg",
            "githubId": 722550,
            "matrix": "@peter:hoeg.com",
            "name": "Peter Hoeg"
          }
        ],
        "name": "adtool-1.3.3",
        "outputsToInstall": [
          "out"
        ],
        "position": "pkgs/tools/admin/adtool/default.nix:40",
        "unfree": false,
        "unsupported": false
      },
      "name": "adtool-1.3.3",
      "outputName": "out",
      "outputs": {
        "out": null
      },
      "pname": "adtool",
      "system": "x86_64-linux",
      "version": "1.3.3"
    },

I suggest to ignore any package marked as broken in, probably somewhere around here:

https://github.com/repology/repology-updater/blob/72800a42ce9bdd912bdd7b0f91c953761db8acba/repology/parsers/parsers/nix.py#L119-L137

nomeata commented 1 year ago

This might also alleviate the concerns in https://github.com/repology/repology-updater/issues/638 a bit.

AMDmi3 commented 1 year ago

I don't think we need this, or rather it should be handled differently and on a global level. Broken packages don't really change a thing for repology - unless the version is incorrect these are still valid sources of even latest versions, and for older versions ignores don't do anything anyway. Regarding 'the repository does not provide this package for users in fact' case, we don't handle this properly anyway - repology mostly tracks source packages and does not even consider binary packages if source package info is available, so except for a quite few cases where we track binary repositories we can't even tell that the package is not available for e.g. non-x86, or is available at all.

nomeata commented 1 year ago

At least in the case of Haskell packages, the “sources” are auto-generated from Hackage, so there are 10000 packages reported as present that nobody ever built, let alone manually touched. As a user of repology, reporting them as being part of nixpkgs is confusing and misleading in a way.

The same problem exists on hackage (which has its own distro-coverage-report), and we are fixing it there as well: https://github.com/NixOS/nixpkgs/pull/243601

nomeata commented 1 year ago

rather it should be handled differently and on a global level.

That sounds good as well, of course. What do you have in mind?

AMDmi3 commented 1 year ago

Not much in particular. We could add a dedicated flag for known broken packages. Webapp would mark packages differently based on it. Regarding arches, we could introduce a dedicated type of matrix badge and/or a project page for the case. But filling it properly would require adding binary packages for repos which support it, and that would blow the database up significantly. For that we'll need massive hosting upgrade and update process rework in order not to drop update rate.