zopsicle / crai

Cryogenic Raku Archive Index
https://crai.foldr.nl
1 stars 1 forks source link

Log of disappeared modules (github) #17

Open AlexDaniel opened 4 years ago

AlexDaniel commented 4 years ago

Sometimes people decide to remove github repos or to delete their modules. This is fine, but it's a pain for everyone when there's something that depends on the deleted code. In the past I restored some modules by re-creating them in https://github.com/raku-community-modules by using git repos that zef stores locally, but I got lucky because I actually had installed these modules in the past. Now that crai provides tarballs it is less of an issue, but it'd be great to know when a module is deleted so that we can react quicker. This should also make release management just a little bit less painful.

I think it'd be nice to have a simple log with timestamps and names of modules that no longer have accessible git repos.

zopsicle commented 4 years ago

I am drafting the new data model and this is what I came up with that should be relevant to this issue:

https://github.com/chloekek/crai/blob/596b83f6cf74f168726d71f7faa0542c93768197/crai/lib/Crai/Database.rakumod#L122-L139

Every time the cron job runs (every hour), it will write down which archives it found on CPAN and GitHub.

Then we can use SQL to query the difference between any two runs, with a query like the following. If you want we can even make it send an email to you or post it in IRC or create a GH issue or something.

SELECT
    archives.meta_name
FROM
    encounters
    INNER JOIN archives
        ON archives.url = encounters.archive_url
WHERE
    encounters.run_when = ?1

EXCEPT

SELECT
    archives.meta_name
FROM
    encounters
    INNER JOIN archives
        ON archives.url = encounters.archive_url
WHERE
    encounters.run_when = ?2
zopsicle commented 4 years ago

You can now see how many archives it found on each run: https://crai.foldr.nl/runs.

This can be easily extended to display which distributions it found, and compare that to other runs.

zopsicle commented 4 years ago

Will need to ignore runs that are outliers in terms of number of archives encountered. It’s more likely that CPAN or GitHub was down or the cronjob crashed, than that hundreds of packages were suddenly deleted.

I don’t know much about statistics so I will have to learn that first, which is fun!