Possibility to auto-disable outdated mirrors

                                                                               [          ]

Issue migrated (2015-06-05) from old issue tracker http://mirrorbrain.org/issues/issue150

Title    Possibility to auto-disable outdated mirrors
 Priority   feature      Status    chatting
Superseder             Nosy List   poeml, rhertzog
Assigned To poeml       Keywords

msg543 (view) Author: rhertzog Date: 2014-02-17.15:14:50

mirrorbrain regularly checks that mirrors are online and working but it doesn't detect mirrors that are stale and outdated. It would be really useful if we could teach MirrorBrain how to detect outdated mirrors so that it could disable them automatically.

The simple answer would be to have a parameter that we can point to a script that will test the mirror and let mirrorbrain know if it's up-to-date (exit code=0), outdated (exit code=1) or if there was an error (any other exit code). The informations about the mirror to check would be provided either via environment variables or via command line parameters. That way we can implement any policy... but it requires scripting skills.

Another approach could be to define a path on the mirror that must be in sync between the mirrors (same size and same SHA1 checksum) and the master copy to consider the mirror up-to-date. But since synchronizations takes time, we must be able to define some grace period before deciding to disable the mirror.

Or better, we could implement the first setting and provide a sample script that implements the second solution while hooking into the mirrobrain.conf to get the required parameters.

msg547 (view) Author: poeml Date: 2014-02-20.01:23:05

Very good idea. This would make MirrorBrain useful in more scenarios. The current mirror checking is so minimal, that it's amazing that we got so far with it. Historically, checking mirror freshness was neglected since it's okay for file trees where files never change in-place, but have their names changing (at least incrementing a counter). Thus, files that change but keep their identical names were always a problem. At openSUSE, requests on some of those files were never redirected to mirrors therefore. It may be complicated or impossible for admins to get rid of those files, of course.

Fedora solved the same issue by having their redirector replying with a Metalink with a Metalink protocol extension, that lists several variants of a file (which might be encountered on a mirror). The redirector effectively tells the client, if the mirror has this file it's okay, and if has a different file, it's also okay.

Scanning the mirrors more deeply, including mtime, file size and calculating hashes isn't really realistic in many cases I think (it might be in some of course). A compromise could be mtime and file size, same as rsync does it (unless forced to look into files with -c). But only rsync scanning would achieve this reliably. HTTP scanning is more fragile, and FTP scanning isn't perfect either (character set issues, time format not standardized).

This just as background. The idea to check the sync status of the mirrors would be a big step forward.

I agree with making the check adaptable, and creating a useful default check. There's a script to create a small timestamp file, which could be used to detect the "sync age". Another check could be for a certain arbitrary file. It would be easy to say "mb, use only mirrors that have file foo" or "mb, use only mirrors where timestamp is not older than 12 hours" or "mb, use only mirrors where the content of file bar is identical to our local copy".

A mirmon-like status report could be generated at the same time.

Several times, I wonder whether /etc/mirrorbrain.conf should contain a setting for the DocRoot of Apache (which is the root of the file tree). That would be very handy to implement checks, create timestamps and further things from a 'mb' command with few effort for the user. (The 'mb makehashes' call would also be less complicated, and less error-prone.) This setting is needed I think.

Further notes:

I committed a small function in r8481 that serves to find a random file in a local file tree, which could be used for some fully automatic test (the admin doesn't even need to specify a file then). A function that I recently wrote when I felt that mirror checking needed to be advanced finally...
There's 'mb test', which doesn't do much yet, but could be the container for the new functionality. (I also need to check what kind of functionality is in mb/mb/testmirror.py, maybe there's something useful already.)
Especially for a mirror that's newly added to the database, the first thing that one wants to know is if the mirror is working and if it was correctly configured (the mirror itself, but also its URLs in the mb database). It should be easy to run a test and see if everything is fine. Thinking of automatic plausibility tests...

History
         Date           User   Action            Args
2014-02-20 01:23:05 poeml    set    messages: + msg547
                                      status: unread -> chatting
2014-02-17 21:29:18 poeml    set    assignedto: poeml
                                      nosy: + poeml
2014-02-17 15:14:50 rhertzog create

poeml / mirrorbrain