rr- / szurubooru

Image board engine, Danbooru-style.
GNU General Public License v3.0
704 stars 178 forks source link

IPFS support for Szurubooru #200

Closed antonizoon closed 5 years ago

antonizoon commented 6 years ago

IPFS is decentralized p2p load balancing, file deduplication, and federated protocol for serving files and webpages. It can provide horizontal scaling across multiple home servers instead of having large servers with bandwidth.

It's become quite mature and there are multiple experimental booru engines that are supporting it, so it could. More info on IPFS here:

https://ipfs.io/ipfs/QmNhFJjGcMPqpuYfxL62VVB9528NXqDNMFXiqN5bgFYiZ1/its-time-for-the-permanent-web.html

Now that Cloudflare supports the ability to function as an IPFS gateway, this provides significantly greater support for public end users to utilize IPFS-based software.

Here are examples in experimental boorus which are rudimentary but could provide guidelines on how to do it:

https://github.com/smugdev/smugboard https://github.com/Kycklingar/PBooru https://github.com/Kycklingar/ipfs-crdt

Although this may not necessarily be in the maintainers interest it is for us, so we will work together towards achieving this feature.

rr- commented 6 years ago

Okay

What about Hydrus?

imtbl commented 6 years ago

While I really like the concept of IPFS, this doesn't sound like quite mature software to me:

 -------------------------------------------------------
| Warning:                                              |
|   This is alpha software. Use at your own discretion! |
|   Much is missing or lacking polish. There are bugs.  |
|   Not yet secure. Read the security notes for more.   |
 -------------------------------------------------------

I'm also not sure this is a great fit for szurubooru. I feel like most people use it for private installations where sharing your (potentially private) data with the whole IPFS network and making it undeletable in the process isn't the greatest of ideas.

Of course, having it optionally might be nice, but just from the concepts it sounds like quite a lot of work. There's a python library, but it looks like they don't even intend anyone to use it yet, looking at the installation instructions.

antonizoon commented 6 years ago

Although I certainly wouldn't run a critical government service on this (though I am interested in possibly integrating Szurubooru into the Library of Congress Concordia crowdsourced handwriting transcription system), IPFS has functioned quite well enough for static files for years in the following examples:

The preferred method right now of interfacing with IPFS is to use the python API wrapper for go-ipfs, the mainline engine, sending a file over to it to be pinned for seeding on your own node, and then embedding a link to the file's hash now utilizing Cloudflare's IPFS gateway. https://github.com/ipfs/py-ipfs-api The py-ipfs reimplementation of the engine is of course quite a long ways away, but that's not required to store data on it.

As with any feature it depends on whether the instance is designed to be public or private depends on how it is used so this must be an optional feature of course, not for private boorus.

In terms of public files I generally would want them to be spread as far as possible. Of course we still want the ability to not serve certain problematic content: we just delete the file from our own node and report it to Cloudflare's abuse to block it from their gateway. "Undeletable" on IPFS is relative where like current boorus we can ban hashes from appearing on our cloudflare IPFS gateway and maintain a block list for others to use like desuarchive and 4plebs do. The blocklist could even be customized per country. Now of course you can still could grab the file by hash from your own IPFS instance, but at least not appear to the public as served from our booru.

As for the permanence of files, a better analogy is torrents: if no one continues to seed them they will disappear, so there still needs to be one seed somewhere: it can just be a small group of home servers with significantly reduced bandwidth usage, as we see with torrent streaming today. IPFS is more resilient where if there is a massive spike in traffic interest in a single image (or even a DDoS). On the other hand, most public gateways will only store files temporarily so a few weeks after this burst in interest most gateways will make room for other files, which is why we have to maintain our seed. (or when Filecoin comes along in the future, pay hard drive miners to store it)

We aim to have a very large public booru (~10TB of images or more) so we have to take drastic steps to reduce bandwidth usage. Now obviously for the vast majority of people even with large public datasets, they will probably be fine with classic HTTP Cloudflare.

For our needs though we have to look to p2p scaling methods, which fortuitously can now integrate with Cloudflare's IPFS gateway, which can be used right now

rr- commented 6 years ago

10TB

Please keep in mind that szurubooru wasn't designed to handle such extreme data volumes (our current instance hosts 28 GB). It's likely you'll have to rework search system for example.

DonaldTsang commented 6 years ago

Using IPFS to off-load traffic would be very convenient, while the DB portions require communications between Hydrus and Szuru. @rr- from a design standpoint Szurubooru would not be handling "10TB"

sgsunder commented 5 years ago

Closing. I think that this is a pretty major change and would add a lot of dependencies, so if there is a strong desire for this, it should probably be done on a forked repo.