mirrorbrain regularly checks that mirrors are online and working but it doesn't
detect mirrors that are stale and outdated. It would be really useful if we
could teach MirrorBrain how to detect outdated mirrors so that it could disable
them automatically.
The simple answer would be to have a parameter that we can point to a script
that will test the mirror and let mirrorbrain know if it's up-to-date (exit
code=0), outdated (exit code=1) or if there was an error (any other exit code).
The informations about the mirror to check would be provided either via
environment variables or via command line parameters. That way we can implement
any policy... but it requires scripting skills.
Another approach could be to define a path on the mirror that must be in sync
between the mirrors (same size and same SHA1 checksum) and the master copy to
consider the mirror up-to-date. But since synchronizations takes time, we must
be able to define some grace period before deciding to disable the mirror.
Or better, we could implement the first setting and provide a sample script that
implements the second solution while hooking into the mirrobrain.conf to get the
required parameters.
Very good idea. This would make MirrorBrain useful in more scenarios.
The current mirror checking is so minimal, that it's amazing that we got
so far with it. Historically, checking mirror freshness was neglected
since it's okay for file trees where files never change in-place, but
have their names changing (at least incrementing a counter). Thus, files
that change but keep their identical names were always a problem. At
openSUSE, requests on some of those files were never redirected to
mirrors therefore. It may be complicated or impossible for admins to get
rid of those files, of course.
Fedora solved the same issue by having their redirector replying with a
Metalink with a Metalink protocol extension, that lists several
variants of a file (which might be encountered on a mirror). The
redirector effectively tells the client, if the mirror has this file it's
okay, and if has a different file, it's also okay.
Scanning the mirrors more deeply, including mtime, file size and
calculating hashes isn't really realistic in many cases I think (it
might be in some of course). A compromise could be mtime and file size,
same as rsync does it (unless forced to look into files with -c).
But only rsync scanning would achieve this reliably. HTTP scanning is
more fragile, and FTP scanning isn't perfect either (character set
issues, time format not standardized).
This just as background. The idea to check the sync status of the
mirrors would be a big step forward.
I agree with making the check adaptable, and creating a useful default
check. There's a script to create a small timestamp file, which could be
used to detect the "sync age". Another check could be for a certain
arbitrary file. It would be easy to say "mb, use only mirrors that have
file foo" or "mb, use only mirrors where timestamp is not older than 12
hours" or "mb, use only mirrors where the content of file bar is
identical to our local copy".
A mirmon-like status report could be generated at the same time.
Several times, I wonder whether /etc/mirrorbrain.conf should contain a
setting for the DocRoot of Apache (which is the root of the file tree).
That would be very handy to implement checks, create timestamps and
further things from a 'mb' command with few effort for the user.
(The 'mb makehashes' call would also be less complicated, and less
error-prone.) This setting is needed I think.
Further notes:
I committed a small function in r8481 that serves to find a random
file in a local file tree, which could be used for some fully
automatic test (the admin doesn't even need to specify a file then). A
function that I recently wrote when I felt that mirror checking needed
to be advanced finally...
There's 'mb test', which doesn't do much yet, but could be the
container for the new functionality. (I also need to check what kind
of functionality is in mb/mb/testmirror.py, maybe there's something
useful already.)
Especially for a mirror that's newly added to the database, the first
thing that one wants to know is if the mirror is working and if it was
correctly configured (the mirror itself, but also its URLs in the mb
database). It should be easy to run a test and see if everything is
fine. Thinking of automatic plausibility tests...
History
Date User Action Args
2014-02-20 01:23:05 poeml set messages: + msg547
status: unread -> chatting
2014-02-17 21:29:18 poeml set assignedto: poeml
nosy: + poeml
2014-02-17 15:14:50 rhertzog create
Issue migrated (2015-06-05) from old issue tracker http://mirrorbrain.org/issues/issue150
msg543 (view) Author: rhertzog Date: 2014-02-17.15:14:50
mirrorbrain regularly checks that mirrors are online and working but it doesn't detect mirrors that are stale and outdated. It would be really useful if we could teach MirrorBrain how to detect outdated mirrors so that it could disable them automatically.
The simple answer would be to have a parameter that we can point to a script that will test the mirror and let mirrorbrain know if it's up-to-date (exit code=0), outdated (exit code=1) or if there was an error (any other exit code). The informations about the mirror to check would be provided either via environment variables or via command line parameters. That way we can implement any policy... but it requires scripting skills.
Another approach could be to define a path on the mirror that must be in sync between the mirrors (same size and same SHA1 checksum) and the master copy to consider the mirror up-to-date. But since synchronizations takes time, we must be able to define some grace period before deciding to disable the mirror.
Or better, we could implement the first setting and provide a sample script that implements the second solution while hooking into the mirrobrain.conf to get the required parameters.
msg547 (view) Author: poeml Date: 2014-02-20.01:23:05
Very good idea. This would make MirrorBrain useful in more scenarios. The current mirror checking is so minimal, that it's amazing that we got so far with it. Historically, checking mirror freshness was neglected since it's okay for file trees where files never change in-place, but have their names changing (at least incrementing a counter). Thus, files that change but keep their identical names were always a problem. At openSUSE, requests on some of those files were never redirected to mirrors therefore. It may be complicated or impossible for admins to get rid of those files, of course.
Fedora solved the same issue by having their redirector replying with a Metalink with a Metalink protocol extension, that lists several variants of a file (which might be encountered on a mirror). The redirector effectively tells the client, if the mirror has this file it's okay, and if has a different file, it's also okay.
Scanning the mirrors more deeply, including mtime, file size and calculating hashes isn't really realistic in many cases I think (it might be in some of course). A compromise could be mtime and file size, same as rsync does it (unless forced to look into files with -c). But only rsync scanning would achieve this reliably. HTTP scanning is more fragile, and FTP scanning isn't perfect either (character set issues, time format not standardized).
This just as background. The idea to check the sync status of the mirrors would be a big step forward.
I agree with making the check adaptable, and creating a useful default check. There's a script to create a small timestamp file, which could be used to detect the "sync age". Another check could be for a certain arbitrary file. It would be easy to say "mb, use only mirrors that have file foo" or "mb, use only mirrors where timestamp is not older than 12 hours" or "mb, use only mirrors where the content of file bar is identical to our local copy".
A mirmon-like status report could be generated at the same time.
Several times, I wonder whether /etc/mirrorbrain.conf should contain a setting for the DocRoot of Apache (which is the root of the file tree). That would be very handy to implement checks, create timestamps and further things from a 'mb' command with few effort for the user. (The 'mb makehashes' call would also be less complicated, and less error-prone.) This setting is needed I think.
Further notes:
(end of migrated issue)