Closed AMDmi3 closed 5 years ago
To save manual work, we can gather statistics on used fields in PackageMaker
and, save them in db and use on the next run to decide whether we need default maintainer. The fact that there's lag of one full update seem to be the only downside.
The suitable solution is as follows:
default_maintainer: no
in repository configThis query is useful to list repositories where the only present maintainer is autogenerated one.
SELECT
repo
FROM (
SELECT
repo,
unnest(maintainers) AS maintainer
FROM packages
) AS tmp
GROUP BY repo
HAVING
count(DISTINCT maintainer) = 1 AND
count(DISTINCT maintainer) FILTER (WHERE maintainer LIKE 'fallback-mnt-%') = 1
ORDER BY repo
Bits about statistics of used package fields moved to #840.
What's left here is to add default_maintainer: no
to more repositories.
Results of the query above, with total length of their maintainer strings.
repo | size
---------------------------+---------
antergos_staging | 456
fdroid | 588
parabola_testing | 1444
arch_testing | 2550
rpmfusion_el_6 | 2592
rpmfusion_el_7 | 4824
reactos | 5771
unitedrpms_29 | 5775
unitedrpms_30 | 5880
unitedrpms_28 | 6055
antergos_main | 7035
distrowatch | 7458
libregamewiki | 10675
buckaroo | 11010
homebrew_tap_brewsci_bio | 11132
rpmfusion_fedora_29 | 15457
rpmfusion_fedora_rawhide | 17020
rpmfusion_fedora_26 | 19762
rpmfusion_fedora_28 | 20131
rpmfusion_fedora_27 | 20623
salix_14_2 | 23168
vcpkg | 25488
scoop | 30294
opensuse_games_tumbleweed | 33934
crux_35 | 39440
crux_34 | 39585
crux_33 | 39846
crux_32 | 41354
freshcode | 48081
yacp | 56862
hpux_11_31 | 60448
centos_6 | 67350
stackage_lts | 79526
openpkg_current | 80549
stackage_nighly | 86247
openindiana | 86592
blackarch | 95325
haikuports_master | 102531
gobolinux | 111817
homebrew | 141270
linuxbrew | 148428
epel_7 | 193256
chocolatey | 203296
aix_osp | 220371
gnuguix | 270164
hyperbola | 272614
arch | 277940
pld | 298150
parabola | 336420
cran | 369174
manjaro_stable | 408744
manjaro_testing | 420024
manjaro_unstable | 430958
opensuse_leap_15_1 | 441080
mageia_6 | 472200
opensuse_tumbleweed | 478142
opensuse_leap_42_3 | 496480
opensuse_leap_15_0 | 526520
mageia_cauldron | 546342
rosa_2016_1 | 722832
crates_io | 740404
wikidata | 753360
rosa_2014_1 | 921030
rubygems | 4455330
(64 rows)
I've decided to instead explicitly mark repositories which DO need default maintainers. Actually, these are a minority. Most repositories which currently have both fallback and real maintainers do so because not all maintainers can be parsed, and there's no reason to group the latter.
Case which needs special attention is repositories which may have both absent maintainer and obfuscated maintainers which cannot be parsed. The example is hackage. Since we only use HackageParser for it, we can set default maintainer in the parser, though this is quite ugly.
@blshkv, does Pentoo have any default value for absent maintainer? Similar to Gentoo's maintainer-needed@gentoo.org
.
@palica hey, does Funtoo have any default value for absent maintainer? Similar to Gentoo's maintainer-needed@gentoo.org
.
@Cogitri hey, you've done some Exherbo related contributions, so you may know - does Exherbo have any default value for absent maintainer? Similar to Gentoo's maintainer-needed@gentoo.org
.
Hum, we do orphan packages, but I'm actually not sure if we have a fallback email. I'll ask in our dev channel, thanks for pinging! :)
Alright, Exherbo doesn't have a fallback maintainer
Alright, Exherbo doesn't have a fallback maintainer
Got it, thanks! Will use default one then.
We don't have it either. Moreover, this field is not accurate often
For completeness, another query to get a sample of packages with real and fallback maintainers for each repo which has any fallback maintainers.
WITH repos_with_fallback AS (
SELECT DISTINCT repo FROM (
SELECT repo, unnest(maintainers) AS maintainer
FROM packages
) AS tmp
WHERE maintainer LIKE 'fallback-mnt-%'
)
SELECT repo, effname, maintainer from
(
SELECT repo, effname, maintainer, row_number() OVER(PARTITION BY repo, maintainer LIKE 'fallback-mnt-%') AS rn
FROM (
SELECT repo, effname, unnest(maintainers) AS maintainer
FROM packages
WHERE REPO IN (SELECT repo FROM repos_with_fallback)
) AS tmp
) AS tmp1
WHERE rn < 10
ORDER BY repo, maintainer LIKE 'fallback-mnt-%', effname, maintainer;
funtoo - we don't have 'maintainer' concept
Fallback maintainers were introduced to as a way to separate packages which have no maintainer in repos where mainainer information is otherwise available, but they have no use for repos in which no maintainer information is available at all. Makes sense to detect these repos and not generate fallback for these to reduce data noise.
Explicitly listing supported fields for each repository could help with this, and also provide a way to autogenerate table we currently maintain in README.md by hand