repology / repology-updater

Repology backend service to update repository and package data
https://repology.org
GNU General Public License v3.0
502 stars 177 forks source link

Drop fallback maintainers for repos without maintainers #824

Closed AMDmi3 closed 5 years ago

AMDmi3 commented 5 years ago

Fallback maintainers were introduced to as a way to separate packages which have no maintainer in repos where mainainer information is otherwise available, but they have no use for repos in which no maintainer information is available at all. Makes sense to detect these repos and not generate fallback for these to reduce data noise.

Explicitly listing supported fields for each repository could help with this, and also provide a way to autogenerate table we currently maintain in README.md by hand

AMDmi3 commented 5 years ago

To save manual work, we can gather statistics on used fields in PackageMaker and, save them in db and use on the next run to decide whether we need default maintainer. The fact that there's lag of one full update seem to be the only downside.

AMDmi3 commented 5 years ago

The suitable solution is as follows:

AMDmi3 commented 5 years ago

This query is useful to list repositories where the only present maintainer is autogenerated one.

SELECT 
    repo
FROM (
    SELECT 
        repo,
        unnest(maintainers) AS maintainer 
    FROM packages
) AS tmp 
GROUP BY repo
HAVING
    count(DISTINCT maintainer) = 1 AND
    count(DISTINCT maintainer) FILTER (WHERE maintainer LIKE 'fallback-mnt-%') = 1
ORDER BY repo
AMDmi3 commented 5 years ago

Bits about statistics of used package fields moved to #840.

What's left here is to add default_maintainer: no to more repositories.

AMDmi3 commented 5 years ago

Results of the query above, with total length of their maintainer strings.

           repo            |  size   
---------------------------+---------
 antergos_staging          |     456
 fdroid                    |     588
 parabola_testing          |    1444
 arch_testing              |    2550
 rpmfusion_el_6            |    2592
 rpmfusion_el_7            |    4824
 reactos                   |    5771
 unitedrpms_29             |    5775
 unitedrpms_30             |    5880
 unitedrpms_28             |    6055
 antergos_main             |    7035
 distrowatch               |    7458
 libregamewiki             |   10675
 buckaroo                  |   11010
 homebrew_tap_brewsci_bio  |   11132
 rpmfusion_fedora_29       |   15457
 rpmfusion_fedora_rawhide  |   17020
 rpmfusion_fedora_26       |   19762
 rpmfusion_fedora_28       |   20131
 rpmfusion_fedora_27       |   20623
 salix_14_2                |   23168
 vcpkg                     |   25488
 scoop                     |   30294
 opensuse_games_tumbleweed |   33934
 crux_35                   |   39440
 crux_34                   |   39585
 crux_33                   |   39846
 crux_32                   |   41354
 freshcode                 |   48081
 yacp                      |   56862
 hpux_11_31                |   60448
 centos_6                  |   67350
 stackage_lts              |   79526
 openpkg_current           |   80549
 stackage_nighly           |   86247
 openindiana               |   86592
 blackarch                 |   95325
 haikuports_master         |  102531
 gobolinux                 |  111817
 homebrew                  |  141270
 linuxbrew                 |  148428
 epel_7                    |  193256
 chocolatey                |  203296
 aix_osp                   |  220371
 gnuguix                   |  270164
 hyperbola                 |  272614
 arch                      |  277940
 pld                       |  298150
 parabola                  |  336420
 cran                      |  369174
 manjaro_stable            |  408744
 manjaro_testing           |  420024
 manjaro_unstable          |  430958
 opensuse_leap_15_1        |  441080
 mageia_6                  |  472200
 opensuse_tumbleweed       |  478142
 opensuse_leap_42_3        |  496480
 opensuse_leap_15_0        |  526520
 mageia_cauldron           |  546342
 rosa_2016_1               |  722832
 crates_io                 |  740404
 wikidata                  |  753360
 rosa_2014_1               |  921030
 rubygems                  | 4455330
(64 rows)
AMDmi3 commented 5 years ago

I've decided to instead explicitly mark repositories which DO need default maintainers. Actually, these are a minority. Most repositories which currently have both fallback and real maintainers do so because not all maintainers can be parsed, and there's no reason to group the latter.

Case which needs special attention is repositories which may have both absent maintainer and obfuscated maintainers which cannot be parsed. The example is hackage. Since we only use HackageParser for it, we can set default maintainer in the parser, though this is quite ugly.

AMDmi3 commented 5 years ago

@blshkv, does Pentoo have any default value for absent maintainer? Similar to Gentoo's maintainer-needed@gentoo.org.

AMDmi3 commented 5 years ago

@palica hey, does Funtoo have any default value for absent maintainer? Similar to Gentoo's maintainer-needed@gentoo.org.

AMDmi3 commented 5 years ago

@Cogitri hey, you've done some Exherbo related contributions, so you may know - does Exherbo have any default value for absent maintainer? Similar to Gentoo's maintainer-needed@gentoo.org.

Cogitri commented 5 years ago

Hum, we do orphan packages, but I'm actually not sure if we have a fallback email. I'll ask in our dev channel, thanks for pinging! :)

Cogitri commented 5 years ago

Alright, Exherbo doesn't have a fallback maintainer

AMDmi3 commented 5 years ago

Alright, Exherbo doesn't have a fallback maintainer

Got it, thanks! Will use default one then.

blshkv commented 5 years ago

We don't have it either. Moreover, this field is not accurate often

AMDmi3 commented 5 years ago

For completeness, another query to get a sample of packages with real and fallback maintainers for each repo which has any fallback maintainers.

WITH repos_with_fallback AS (
        SELECT DISTINCT repo FROM (
                SELECT repo, unnest(maintainers) AS maintainer
                FROM packages
        ) AS tmp
        WHERE maintainer LIKE 'fallback-mnt-%'
)
SELECT repo, effname, maintainer from
(
        SELECT repo, effname, maintainer, row_number() OVER(PARTITION BY repo, maintainer LIKE 'fallback-mnt-%') AS rn
        FROM (
                SELECT repo, effname, unnest(maintainers) AS maintainer
                FROM packages
                WHERE REPO IN (SELECT repo FROM repos_with_fallback)
        ) AS tmp
) AS tmp1
WHERE rn < 10
ORDER BY repo, maintainer LIKE 'fallback-mnt-%', effname, maintainer;
palica commented 5 years ago

funtoo - we don't have 'maintainer' concept