ipfs / notes

IPFS Collaborative Notebook for Research
MIT License
402 stars 31 forks source link

Wikipedia Integrations #46

Open jbenet opened 9 years ago

jbenet commented 9 years ago

We've been planning to "put Wikipedia on IPFS" for a long, long time. this issue will track possible integration points and their progress. These may lead to independent repos, etc.

In short, the way i see it, we have multiple layers of "integration" with wikipedia. these are discussed below in more detail.

  1. Archive: archive all of wikipedia on IPFS -- as in https://github.com/ipfs/archives
  2. Media: assist wikipedia.org with serving wikipedia media via IPFS ("the big stuff")
  3. Rehost: serve all of wikipedia over IPFS (falling back to ipfs http gateway)
  4. Restructure: rethink wikipedia's datastructures as CRDTs (or even basic git commits), to create new wiki software that leverages IPFS.

(4) is the most exciting to me, but wont happen for a while. (1-3) we can already do. Let's start with (1) and (2).

1. Archive: archive all of wikipedia on IPFS -- as in https://github.com/ipfs/archives

This is a matter of regularly downloading data dumps and adding them. We need to construct "help archive X" pages to publish the newest heads and guide people to help get an archive setup. (may need ipfs-cluster for good success to happen.

We can do this on our own and do not need to ask for permission, as everything is CC. (correct me if i'm wrong pls).

Steps:

This means hosting all of the big files that wikipedia has to serve. It's perhaps where we can contribute the most, but then again our poor gateway may not be able to deal with the massive bandwidth usage.

What we need, then, is

After 2 is done, we can proceed with a full mirror. (it may be easier to skip 2. and go to 3., this is to be discussed, but seems harder given difficulty on their end integrating with their backend and so on).

4. Restructure: rethink wikipedia's datastructures

This means restructuring how wikipedia's internal datastructures work to provide an editing model based on either CRDTs (or basic git commits). We could then put these directly on top of IPFS and allow people to edit + create "wikipedia commits" and "wikipedia PRs" all over IPFS.

This is a large undertaking, so perhaps step 1 is rethink the mediawiki data storage layer over ipfs first, and try making a demo. Also worth thinking about federated wiki in this context and see where "upgrading wikipedia with fedwiki" might lead. I think in general, it may be safest to just replace the storage layer first, and go from there.

To me, this is the most interesting part. But it's the biggest and the one which will take the longest to do.

jbenet commented 9 years ago

cc @davidar @domschiener @whyrusleeping @diasdavid @rht @lgierth @mappum

domschiener commented 9 years ago

moved by @jbenet to https://github.com/ipfs/notes/issues/47#issue-106674105

jbenet commented 9 years ago

moved by @jbenet to https://github.com/ipfs/notes/issues/47#issuecomment-140587470

domschiener commented 9 years ago

moved by @jbenet to https://github.com/ipfs/notes/issues/47#issuecomment-140587530

jbenet commented 9 years ago

@domschiener please move this discussion to another issue i moved it to https://github.com/ipfs/notes/issues/47

davidar commented 9 years ago

:+1:

rht commented 9 years ago

For layer 4, at least today there are several implementations of git-based wiki (i.e. can be distributed but minus the built-in way to preserve a canonical dag chain).

almereyda commented 8 years ago

@opn and @WardCunningham have been working on a so-called transformerporter to load Wikipedia pages into Federated Wiki.

Entrance points to this could be


Hey, what's this?

ldct commented 6 years ago

hey @jbenet I saw the blog post about the Turkish wikipedia dump on IPFS. Are goals 2-4 still being worked on?