internetarchive / dweb-gateway

Decentralized web Gateway for Internet Archive
GNU Affero General Public License v3.0
21 stars 5 forks source link

dweb-gateway

A decentralized web gateway for open academic papers on the Internet Archive

NOTE THIS REPO IS NO LONGER MAINTAINED, ITS ALL MOVED TO ia-dweb and www-dweb on internal system

which just calls the dweb-archivecontroller routing.js

Important editing notes

Other Info Links

Overview

This gateway sits between a decentralized web server running locally (in this case an Go-IPFS server) and the Archive. It will expose a set of services to the server.

The data is stored in a sqlite database that matches DOI's to hashes of the files we know of, and the URLs to retrieve them.

Note its multivalue i.e. a DOI represents an academic paper, which may be present in the archive in various forms and formats. (e.g. PDF, Doc; Final; Preprint).

See [Information flow diagram](./Academic Docs IPFS gateway.pdf)

Structure high level

Those services will be built from a set of microservices which may or may not be exposed.

All calls to the gateway will come through a server that routes to individual services.

Server URLs have a consistent form /outputformat/namespace/namespace-dependent-string

Where:

This is implemented as a pair of steps

See HTTPServer for how this is processed in an extensible form.

See UseCases and Classes for expansion of this

See HTTPS API for the API exposed by the URLs.

Installation

This should work, someone please confirm on a clean(er) machine and remove this comment.

You'll first need REDIS & Supervisor to be installed

On a Mac

brew install redis
brew services start redis
brew install supervisor

On a Linux

Supervisor install details are in: [https://pastebin.com/ctEKvcZt] and [http://supervisord.org/installing.html]

Its unclear to me how to install REDIS, its been on every machine I've used.

Python gateway:

Installation

# Note it uses the #deployable branch, #master may have more experimental features. 
cd /usr/local   # On our servers its always in /usr/local/dweb-gateway and there may be dependencies on this
git clone http://github.com/internetarchive/dweb-gateway.git

Run this complex install script, if it fails then check the configuration at top and rerun. It will:

There are zero guarrantees that changing the config will not cause it to fail!

cd dweb-gateway
scripts/install.sh 

In addition

Update

cd /usr/local/dweb-gateway; scripts/install.sh should update from the repo and restart

Restart

supervisorctl restart dweb:dweb-gateway

Gun, Webtorrent Seeder; Webtorrent-tracker

Installation

They are all in the dweb-transport repo so ...

cd /usr/local # There are probably dependencies on this location
git clone http://github.com/internetarchive/dweb-transport.git
npm install
# Supervisorctl, nginx and ferm should have been setup above.
supervisorctl start dweb:dweb-gun
supervisorctl start dweb:dweb-seeder
supervisorctl start dweb:dweb-tracker

Update

cd /usr/local/dweb-transport
git pull
npm update
supervisorctl restart dweb:*
sleep 10    # Give it time to start and not quickly exit
supervisorctl status

Restart

supervisorctl restart dweb:* will restart these, and the python gateway and IPFS or restart dweb:dweb-gun or dweb:dweb-seeder or dweb:dweb-tracker individually.

IPFS

Installation

Was done by Protocol labs and I’m not 100% sure the full set of things done to setup the repo in a slightly non-standard way,

In particular I know there is a command that have to be run once to enable the ‘urlstore’ functionality

And there may be something needed to enable WebSockets connections (they are enabled in the gateway’s nginx files)

There is a cron task running every 10 minutes that calls one of the scripts and works around a IPFS problem that should be fixed at some point, but not necessarily soon.

3,13,23,33,43,53 * * * * python3 /usr/local/dweb-gateway/cron_ipfs.py

Update

ipfs update install latest
supervisorctl restart dweb:dweb-ipfs

Should work, but there have been issues with IPFS's update process in the past with non-automatic revisions of the IPFS repo.

Restart

supervisorctl restart dweb:dweb-ipfs

dweb.archive.org UI

cd /usr/local && git clone http://github.com/internetarchive/dweb-archive.git
cd /usr/local/dweb-archive && npm install