Open kelson42 opened 1 year ago
This is becoming more important with one of our mirror (https://mirror.accum.se/mirror/kiwix.org/) hosting our files on multiple servers (it's a mirror frontend itself) making use of redirections which are not supported by mirrorbrain.
I don't know if that's supported in mirrorcache though but I know mb is not worth it.
In the mean time, I've duplicated the mirror entry so we point independently to the two offloaders I've seen files in. This wastes a lot of requests in scanMirror step but at least we can use the mirror…
See https://github.com/etix/mirrorbits as well
I had tested MirrorCache a long time ago and without remembering the details it was really too short on the features.
Mirrorbits seems more mature and deserve probably to give a try.
Here are the features we like or rely on:
@benoit74 @rgaudin Do you see other features which are important to us?
Now what needs to be decided is when and how we will proceed to move forward with this POC with Mirrorbits.
I have very little experience on this part of the stack.
One thing we struggle with currently is the scans of mirrors to refresh individual assets status. Currently this process has to be made one mirror at a time, it is not possible to run in parallel (at least we failed). The new solution must be able to run this scan in parallel, otherwise it is not scalable. As the number of mirrors grows, the time to scan all of them grows as well and our refresh period if getting bigger and bigger.
Currently the refresh period is getting pretty high, more than 2 hours at least: https://kiwixorg.grafana.net/d/bb0f0990-04c5-4314-8afc-6185ac49c668/mirrorbrain?orgId=1&from=1695625815425&to=1696230615425
We've decided that @benoit74 will assess mirrorbits in regard to our needs. What we want to know is:
This is my comparison chart so far.
❌ Not Supported, bad thing ✅ Supported ❓Unknown (meaning probably not)
Feature | MirrorCache | MirrorBits |
---|---|---|
Metalink (Metalink headers) | ✅ | ❓JSON file mentioned, but not compatible with aria obviously |
Bittorrent files | ❓ | ❓ |
Magnet links | ❓ | ❓ |
Mirror mgmt via ftp/http/rsync | HTTP only❓ (no access to file) | FTP and RSYNC only |
Priorisation of mirrors | ❓ | ✅ |
Auto choice of mirrors based on client geo location | Geo only | Geo + AS number + custom rules |
Multiple hashes of files | ❓ | ❓ |
Easy update of mirrors file database (at file/directory level) | ✅ Mirrorcache has been designed to fix Mirrorbrain issues around parallel scans and scans taking ages to update the DB | ❓ |
Support of very large files >100GB | ✅ Probably | ✅ Probably |
IPV6 support | ✅ | ✅ |
Documentation | ❌ Too limited | ❌ Too limited |
Programming Language | Perl | GoLang |
Database | PostgreSQL | Redis (with persistence) |
Project liveness | Project updated regularly ; Multiple PR closed on a regular basis, including last days / weeks | No update since at least one year, no code change since 2020, many very simple pending PR without responses, still based on Golang 1.13 (Sept 2019) |
Developers | One main dev (Andrii Nikitin), working at openSuse (project supporter) ; another person helped a bit in the past | One single dev (etix), based in Paris, former Videolan Ops + developer, no more activity on Github / personal blog / twitter |
Usage | openSuse only ? | Many websites mentioned, including some which have stopped using it |
I'm really not convinced by those two solutions. I would probably prefer to stay with MirrorBrain for now until we find a better solution.
If we are forced to choose one now, I will try MirrorCache for:
Effort to implement MirrorCache given all missing features is however probably significant (1 month?). I have too limited experience of Bittorrent / Magnet links to say something very pertinent on that point. But since it is written in Perl, we probably need to hire an external developer to do our stuff.
Thank you ; very useful 👍
In this case, we're probably better off keeping mirrorbrain until we're forced out. Main concern is security obviously. Our data is not completely safe as we mount the downloads folder in rw
in order to write the mirrors.html
file in the update-mb-db job. We canshould find a way around that.
More concerning would be the possibility of altering mirrorbrain's response to inject redirections to our users.
Should we close this ticket for now?
Couple notes:
I did not noticed the last issue regarding the fact that jbkempf is maintaining mirrorbits live, it is indeed quite an important information. And your other points are important as well. I'm really puzzled by all this information.
We should gather the problems/challenges we have with mb to be able to complete comparaison.
❌ Not Supported, bad thing ✅ Supported ❓Unknown (meaning probably not)
Feature | MirrorCache | MirrorBits | MirrorBrain |
---|---|---|---|
HTML list of mirrors | ❓ | ❌ | ✅ |
Metalink (Metalink headers) | ✅ | ❓JSON file mentioned, but not compatible with aria obviously | ✅ |
Bittorrent files | ❓ | ❓ | ✅ (but only torrent creation, not announced to tracker to validate torrent file - working only thanks to our "custom" tracker) |
Magnet links | ❓ | ❓ | ❌ (supported but buggy) |
Mirror mgmt via ftp/http/rsync | HTTP only❓ (no access to file) | FTP and RSYNC only | FTP, RSYNC and HTTP |
Priorisation of mirrors | ❓ | ✅ | ✅ |
Auto choice of mirrors based on client geo location | Geo only | Geo + AS number + custom rules | Geo + AS number |
Multiple hashes of files | ❓ | ✅ (found in JSON file) | ✅ |
Easy update of mirrors file database (at file/directory level) | ✅ Mirrorcache has been designed to fix Mirrorbrain issues around parallel scans and scans taking ages to update the DB | ✅ mirrorbits supports parallel scan (only one scan per mirror at a time obviously). Both rsync and FTP are efficient : rsync works off the list of files returned by rsync (uses the rsync bin) and FTP recursively CWD and ls in all folders. | ❌ (no parallel scan, lock issue) |
Support of very large files >100GB | ✅ Probably | ✅ Probably | ✅ |
IPV6 support | ✅ | ✅ | ❓ |
Documentation | ❌ Too limited | ❌ Too limited | ✅ |
Programming Language | Perl | GoLang | Python (admin/management) + C (runtime HTTP) |
Database | PostgreSQL | Redis (with persistence) | PostgreSQL |
Project liveness | Project updated regularly ; Multiple PR closed on a regular basis, including last days / weeks | No update since at least one year, no code change since 2020, many very simple pending PR without responses, still based on Golang 1.13 (Sept 2019), but some oversight by jbkempf (VLC) + some potential contributions from Jenkins team | Dead |
Developers | One main dev (Andrii Nikitin), working at openSuse (project supporter) ; another person helped a bit in the past | One single dev (etix), based in Paris, former Videolan Ops + developer, no more activity on Github / personal blog / twitter | No more |
Usage | openSuse only ? | Many websites mentioned, including some which have stopped using it | ? |
Just updated with Mirrobrain column + fixes to Mirrobits details + new line regarding HTML home page
@benoit74 Thank you very much for this analysis. Looking at the results, it tends to confirm my first opinion that the easiest path would be to continue (by fixing a few details) with Mirrorbrain (at least for the moment). @rgaudin What is your analysis and proposal?
As discussed with @benoit74 my opinion is to continue with MB until we're forced out. In that case, should the environnement be the same, I support patching mirrorbits to add metalink support and hashes on same paths (both very easy). As for BT, it's relatively easy as well but whether it would be integrated upstream is another question.
OK, then I guess this ticket is implemented (at least for the short term), we will need to fork Mirrorbrain to fix the most urgent stuff.
I think we can just patch a couple things in our image without adding the burden of a fork. This guy's patch is a line in a perl script
@benoit74 FWIW and while https://github.com/etix/mirrorbits/issues/138 is in progress, we're using mirrorbits on Jenkins Infrastructure, with our own docker image and helm chart, that might interest you:
Thanks a lot @lemeurherve for the pointers
@kelson42 please take another look
And have a look especially at https://github.com/etix/mirrorbits/issues/138 and https://github.com/etix/mirrorbits/issues/179 which shows that maintenance of mirrorbits is "getting better"
Mirrorbrain is deprecated and there is a replacement http://www.mirrorcache.org/. We should probably migrate our architecture