Create a super-seeder docker image/container

kelson42 commented 5 years ago

Looks like still not all BitTorrent clients can deal properly with our Web seeds. Having a complete and always running super-seeder would help to solve that problem. We could run it on a mirror (files already there). Additionally this Docker image might be interesting to a few Kiwix supporters who have that way a solution to support the project by easily sharing a bit of there bandwidth.

Using rsync, see https://download.kiwix.org/README, and rtorrent, that should not be too complicated.

nemobis commented 5 years ago

Do you think the docker image should assume a disk space in the TBs, to automatically seed everything, or something more conservative?

kelson42 commented 5 years ago

@nemobis The whole download.kiwix.org is around 10TB. It is difficult to assume that a seeder has so much space for this. What might be a solution to that problem is to be able to share (as a Docker environment variable) a list of path regular expressions to filter what he wants to seed from download.kiwix.org.

kelson42 commented 5 years ago

An old attemps can be found in this repo with the files:

kiwix_superseed.sh
.rtorrent.rc

These files should be removed at the end of the implementation

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

kelson42 commented 4 years ago

Here a base of work https://gitlab.com/adrienandrem/kiwix-torrent-watcher

rgaudin commented 4 years ago

I see this is very recent. What's the status of this? Is there any need beyond that?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

kelson42 commented 2 years ago

I have been thinking about this ticket the last weeks and I think I know now how to do that the best way.

First of all, I plan to use a pre-existing Docker image https://github.com/linuxserver/docker-qbittorrent because:

It exists and is maintained
qBittorrent is a well know BitTorrent client
This is based on the qbittorrent-nox version, which is headless
qBittorrent proposed since many years an API which allows us to have a proper way to instrumentalise it
We can intrumentalize from outside of the container easily
It exists many client tools and library to deal with this API
I have verified it works and propose the options I believe we need

Considering we reuse linuxserver/qbitttorrent Docker image, we still have to have a solution to synchronise (add/remove torrents) with https://download.kiwix.org (or maybe even better https://library.kiwix.org?). I plan to do so:

Build a dedicated Docker image based on a simple bash script running in cron
Script will retrieve the list of ZIM to mirror in the superseeder based on the OPDS feed (so user can set a filter if needed)
Based on the feed data (parse with gron), script with require via API the qbittorrent client (via https://github.com/fedarovich/qbittorrent-cli/) to download new content
Content which are not in the feed anymore, will be deleted after a configurable delay.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

benoit74 commented 1 year ago

I like the idea of using linuxserver/qbitttorrent and its Docker image ; the project is very active and mature ; the web API is very usefull

I don't get if we want to :

download missing content locally to be able to serve it
take benefit of existing local content, meaning it has to be installed on a mirror

It looks like the initial idea was option 2, but we have switched to option 1, and I don't know why

I see lots of advantages in option 2 because:

we already have tooling to mirror files, I don't see the benefit of re-implementing it with bittorrent instead of rsync
it might save storage space + bandwidth for those who already have a mirror (and this is our case)
it will avoid complexities in our tooling (filtering what we want to download, deciding what has to be purged about which delay, ...)
it will avoid potential conflicts if installed on a place where a mirroring tool is already running (otherwise both the mirroring tool and the super seeder will need to have write access to the same location)
it will work even for hidden ZIM files / non-ZIM content if installed on download.kiwix.org

The drawbacks of option 2 are that:

we need to detect which files are available locally to add them to qbittorrent (I've checked, it is capable to handle already existing files)
we don't need to use the OPDS feed (I consider that qbittorrent will check file hash in any case before seeding it)
we need to detect which files have been removed to remove them from qbittorrent (but qbittorrent is probably already handling it a bit, it is quite common that users move downloaded files once the download is finished and usually - at least in Transmission - the client does not restart the resource download)

Looking at existing proposal, I can't comment much on that, it's a shell script and it's clearly not a language I can comment a lot. The overall logic is simple so it looks like it will work. I don't know how many subtleties we might discover once running in real conditions.

I wonder if we should instead write this additional tooling in Python, because:

qbittorrent CLI project is not maintained anymore while the Python library (https://github.com/rmartin16/qbittorrent-api) is maintained and very active (already supporting Python 3.12 and latest qbittorrent release)
it is easier (for me at least) to develop / test / maintain

rgaudin commented 1 year ago

I think what this ticket misses is an (up-to-date) description of what problem this should solve with user scenarios examples. The discussion already highlights that the storage/selection/cleanup is core. It's important to clear how our need and kiwix-enthusiasts' ones align for instance.

kelson42 commented 1 year ago

I don't get if we want to :
1. download missing content locally to be able to serve it

2. take benefit of existing local content, meaning it has to be installed on a mirror
It looks like the initial idea was option 2, but we have switched to option 1, and I don't know why

We should be able to do both because:

Kiwix should provide a super-seeder
Anybody should easily be able to superseed all or parts of the https://download.kiwix.org/zim

I see lots of advantages in option 2 because:

* we already have tooling to mirror files, I don't see the benefit of re-implementing it with bittorrent instead of rsync

* it might save storage space + bandwidth for those who already have a mirror (and this is our case)

* it will avoid complexities in our tooling (filtering what we want to download, deciding what has to be purged about which delay, ...)

* it will avoid potential conflicts if installed on a place where a mirroring tool is already running (otherwise both the mirroring tool and the super seeder will need to have write access to the same location)

* it will work even for hidden ZIM files / non-ZIM content if installed on download.kiwix.org

Creating a HTTP mirror needs a lot more of infrastructure effort than creating a BitTorrent super-seeder. This is why both solutions don't really compete in the same field. The most obvious ones been to have a big and stable bandwidth and a fix IP.

The drawbacks of option 2 are that:

* we need to detect which files are available locally to add them to qbittorrent (I've checked, it is capable to handle already existing files)

Yes, this should be trivial. I'm ready to reconsider the requirements if not.

* we don't need to use the OPDS feed (I consider that qbittorrent will check file hash in any case before seeding it)

True, but that part is already implemented in the BitTorrent tracker, this is not really new work.

* we need to detect which files have been removed to remove them from qbittorrent (but qbittorrent is probably already handling it a bit, it is quite common that users move downloaded files once the download is finished and usually - at least in Transmission - the client does not restart the resource download)

True, wonder if this part is also not handled in the BitTorrent tracker!

Looking at existing proposal, I can't comment much on that, it's a shell script and it's clearly not a language I can comment a lot. The overall logic is simple so it looks like it will work. I don't know how many subtleties we might discover once running in real conditions.

If I remember correctly, I was almost over with the work and I was just lacking time. I don't remember having faced big challenges linked to subtilities.

I wonder if we should instead write this additional tooling in Python, because:

* qbittorrent CLI project is not maintained anymore while the Python library (https://github.com/rmartin16/qbittorrent-api) is maintained and very active (already supporting Python 3.12 and latest qbittorrent release)

* it is easier (for me at least) to develop / test / maintain

Nothing against this, should be fairly easy. I made it in Bourne shell because I didn't wanted to impose Perl as I can not write Python myself. Actually this is even probably a good idea.

kelson42 commented 6 days ago

For the following reasons I believe the effort of completing this PR woukd be really helpful:

We have already and recently invested significant amount of resources to improve quality of download speed
Audience grows and we have HTTP mirrors which struggle
Kiwix Desktop (in particular the downloader) has made a significant quality jump with version 2.4.0. We would be ready there to better support BitTorrent (both download/upload).
Still not all BitTorrent clients don't support Webseed properly

For all these reasons I believe we should now secure the super-seeder guaranties download via BitTorrent works as good as we could expect.

nemobis commented 6 days ago

Nice to see some movement. I'm happy to help test this but I'll need some suggestions on what files to seed (the last times I tried to seed Kiwix torrents I failed to reach any meaningful ratio).

What I'd personally really like is a ruTorrent/transmission/other plugin to handle the addition and removal of torrents. That would be easy to install on top of any existing installation method, be it a web UI or a docker image.

kelson42 commented 6 days ago

Nice to see some movement. I'm happy to help test this but I'll need some suggestions on what files to seed (the last times I tried to seed Kiwix torrents I failed to reach any meaningful ratio).

Since around two years, we have our own BitTorrent tracker. Therefore that for any "famous" ZIM file, you will find peers to share bits. Not even talking about the Web seeds.

This issue is only there to offer a guarante, to have - at least - one peer (to download from).

What I'd personally really like is a ruTorrent/transmission/other plugin to handle the addition and removal of torrents. That would be easy to install on top of any existing installation method, be it a web UI or a docker image.

To me this belongs to an other issue which is left to open. I was not even aware it was possible to create plugin for such a purpose to Transmission.

nemobis commented 6 days ago

Therefore that for any "famous" ZIM file, you will find peers to share bits. Not even talking about the Web seeds.

Ok. I've ever had trouble finding peers via DHT or the public trackers. I just never get anyone leeching these days, probably because the web seeds are so fast. So I have no idea what to seed.

To me this belongs to an other issue which is left to open.

Maybe. OTOH it's compatible with the idea you wrote above:

Script will retrieve the list of ZIM to mirror in the superseeder based on the OPDS feed

The "script" can be implemented as something that uses the transmission/rtorrent/other RPC.

kiwix / container-images

Create a super-seeder docker image/container #43