Open kelson42 opened 5 years ago
Do you think the docker image should assume a disk space in the TBs, to automatically seed everything, or something more conservative?
@nemobis The whole download.kiwix.org is around 10TB. It is difficult to assume that a seeder has so much space for this. What might be a solution to that problem is to be able to share (as a Docker environment variable) a list of path regular expressions to filter what he wants to seed from download.kiwix.org.
An old attemps can be found in this repo with the files:
These files should be removed at the end of the implementation
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
Here a base of work https://gitlab.com/adrienandrem/kiwix-torrent-watcher
I see this is very recent. What's the status of this? Is there any need beyond that?
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
I have been thinking about this ticket the last weeks and I think I know now how to do that the best way.
First of all, I plan to use a pre-existing Docker image https://github.com/linuxserver/docker-qbittorrent because:
Considering we reuse linuxserver/qbitttorrent
Docker image, we still have to have a solution to synchronise (add/remove torrents) with https://download.kiwix.org (or maybe even better https://library.kiwix.org?). I plan to do so:
gron
), script with require via API the qbittorrent client (via https://github.com/fedarovich/qbittorrent-cli/) to download new contentThis issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
I like the idea of using linuxserver/qbitttorrent
and its Docker image ; the project is very active and mature ; the web API is very usefull
I don't get if we want to :
It looks like the initial idea was option 2, but we have switched to option 1, and I don't know why
I see lots of advantages in option 2 because:
The drawbacks of option 2 are that:
Looking at existing proposal, I can't comment much on that, it's a shell script and it's clearly not a language I can comment a lot. The overall logic is simple so it looks like it will work. I don't know how many subtleties we might discover once running in real conditions.
I wonder if we should instead write this additional tooling in Python, because:
I think what this ticket misses is an (up-to-date) description of what problem this should solve with user scenarios examples. The discussion already highlights that the storage/selection/cleanup is core. It's important to clear how our need and kiwix-enthusiasts' ones align for instance.
I don't get if we want to :
1. download missing content locally to be able to serve it 2. take benefit of existing local content, meaning it has to be installed on a mirror
It looks like the initial idea was option 2, but we have switched to option 1, and I don't know why
We should be able to do both because:
I see lots of advantages in option 2 because:
* we already have tooling to mirror files, I don't see the benefit of re-implementing it with bittorrent instead of rsync * it might save storage space + bandwidth for those who already have a mirror (and this is our case) * it will avoid complexities in our tooling (filtering what we want to download, deciding what has to be purged about which delay, ...) * it will avoid potential conflicts if installed on a place where a mirroring tool is already running (otherwise both the mirroring tool and the super seeder will need to have write access to the same location) * it will work even for hidden ZIM files / non-ZIM content if installed on download.kiwix.org
Creating a HTTP mirror needs a lot more of infrastructure effort than creating a BitTorrent super-seeder. This is why both solutions don't really compete in the same field. The most obvious ones been to have a big and stable bandwidth and a fix IP.
The drawbacks of option 2 are that:
* we need to detect which files are available locally to add them to qbittorrent (I've checked, it is capable to handle already existing files)
Yes, this should be trivial. I'm ready to reconsider the requirements if not.
* we don't need to use the OPDS feed (I consider that qbittorrent will check file hash in any case before seeding it)
True, but that part is already implemented in the BitTorrent tracker, this is not really new work.
* we need to detect which files have been removed to remove them from qbittorrent (but qbittorrent is probably already handling it a bit, it is quite common that users move downloaded files once the download is finished and usually - at least in Transmission - the client does not restart the resource download)
True, wonder if this part is also not handled in the BitTorrent tracker!
Looking at existing proposal, I can't comment much on that, it's a shell script and it's clearly not a language I can comment a lot. The overall logic is simple so it looks like it will work. I don't know how many subtleties we might discover once running in real conditions.
If I remember correctly, I was almost over with the work and I was just lacking time. I don't remember having faced big challenges linked to subtilities.
I wonder if we should instead write this additional tooling in Python, because:
* qbittorrent CLI project is not maintained anymore while the Python library (https://github.com/rmartin16/qbittorrent-api) is maintained and very active (already supporting Python 3.12 and latest qbittorrent release) * it is easier (for me at least) to develop / test / maintain
Nothing against this, should be fairly easy. I made it in Bourne shell because I didn't wanted to impose Perl as I can not write Python myself. Actually this is even probably a good idea.
For the following reasons I believe the effort of completing this PR woukd be really helpful:
For all these reasons I believe we should now secure the super-seeder guaranties download via BitTorrent works as good as we could expect.
Nice to see some movement. I'm happy to help test this but I'll need some suggestions on what files to seed (the last times I tried to seed Kiwix torrents I failed to reach any meaningful ratio).
What I'd personally really like is a ruTorrent/transmission/other plugin to handle the addition and removal of torrents. That would be easy to install on top of any existing installation method, be it a web UI or a docker image.
Nice to see some movement. I'm happy to help test this but I'll need some suggestions on what files to seed (the last times I tried to seed Kiwix torrents I failed to reach any meaningful ratio).
Since around two years, we have our own BitTorrent tracker. Therefore that for any "famous" ZIM file, you will find peers to share bits. Not even talking about the Web seeds.
This issue is only there to offer a guarante, to have - at least - one peer (to download from).
What I'd personally really like is a ruTorrent/transmission/other plugin to handle the addition and removal of torrents. That would be easy to install on top of any existing installation method, be it a web UI or a docker image.
To me this belongs to an other issue which is left to open. I was not even aware it was possible to create plugin for such a purpose to Transmission.
Therefore that for any "famous" ZIM file, you will find peers to share bits. Not even talking about the Web seeds.
Ok. I've ever had trouble finding peers via DHT or the public trackers. I just never get anyone leeching these days, probably because the web seeds are so fast. So I have no idea what to seed.
To me this belongs to an other issue which is left to open.
Maybe. OTOH it's compatible with the idea you wrote above:
Script will retrieve the list of ZIM to mirror in the superseeder based on the OPDS feed
The "script" can be implemented as something that uses the transmission/rtorrent/other RPC.
Looks like still not all BitTorrent clients can deal properly with our Web seeds. Having a complete and always running super-seeder would help to solve that problem. We could run it on a mirror (files already there). Additionally this Docker image might be interesting to a few Kiwix supporters who have that way a solution to support the project by easily sharing a bit of there bandwidth.
Using rsync, see https://download.kiwix.org/README, and rtorrent, that should not be too complicated.