DatArchive API - Githubissues

RangerMauve commented 6 years ago

Hi, I really like where this extension was going and was wondering if you'd be interested in some discussion about how to go about enabling the DatArchive API.

I was thinking that it could be done by having a gateway which produces replication streams for dats.

What it could look like is this:

Set up a WS server
When getting an incoming connection, get the Dat public key from the URL
Create a memory-only instance of the archive and pipe the WS socket to the replication stream
Do the usual peer discovery dance
Client-side use node-dat-archive with a patched discovery-swarm which only identifies one peer, the gateway.

This will simplify accessing new Dats, but does not provide a mechanism for creating and seeding new ones.

RangerMauve commented 6 years ago

I've started working on this feature for dat-gateway

I think node-dat-archive could be easily modified to take hyperdrive instance as an argument to have it skip creating a dat instance altogether to work in the browser.

Then the client side could be something like

function loadArchive(url) {
  // Get key from URL
  const drive = hyperdrive(memoryStore, {
   discoveryKey: keyFromURLHere
  }
  const archive = new NodeDatArchive(url, {
    archive: drive
  })

  return archive
}

From there you could have the extension manage keys for whatever storage was needed.

Pair that with a public gateway out of the box, and people don't even need to install anything to get dat working in their browser!

sammacbeth commented 6 years ago

I'm a bit unsure about making this api available over a public interface, as this would enable others on the network to potentially modify your private Dats.

Currently, on the native-messaging branch, I'm experimenting with having the browser launch the gateway server, and then having communication between the extension and gateway over stdio. So far I've been able to use this to implement the DatArchive.resolveName function, so the extension can check if hostnames have dat addresses (see https://github.com/sammacbeth/dat-fox/blob/native-messaging/bridge/index.js#L15).

This method has the added advantage that the user does not have to spawn the gateway process manually. Once the binary is properly installed in the browser, it should work seamlessly.

RangerMauve commented 6 years ago

Just because it's a network socket doesn't mean that it's available to the public internet. You could make it listen on 127.0.0.1 in addition to the port which will restrict the traffic to the local machine.

In addition, the gateway isn't storing the private keys for your data. Its only acting as a proxy to the rest of the network, your private keys can be stored inside the extension's local storage and the extension can connect to the gateway when it wants to update a dat (which will propagate to the rest of the network)

RangerMauve commented 6 years ago

I really like the prospect of not having to pay ch the gateway though.

Are you going to have one process for dat or multiplex multiple replication streams over stdin?

I was leaning towards using sockets because it will allow for the most code reuse and will allow us to use a public gateway to prevent users from even having to install something in the first place.

Plus, embedding the gateway into dat-desktop would make it easy for users to decide when to turn it on and potentially help them with choosing which archives should be pinned.

sammacbeth commented 6 years ago

I think both patterns make sense. One could envisage multiple modes of operation, depending on the user's setup:

Public gateway. The zero configuration option - the extension uses a public dat-gateway instance. This instance may also provide an API to access read-only operations of the DatArchive API. With this configuration forking and creating dats would not be available, but this can be a carrot to push them to switch to a local gateway.
Local gateway via native messaging. Could be installed by running a script which sets up the binary and manifest Firefox. Once that is done the user can again browser as usual, because the browser handles the gateway lifecycle.
Local gateway via external app. A manually hosted dat-gateway, or even gui app like dat-desktop. This can provide the same APIs as the public gateway, but now with write access.

I think I can move the project forward with the capability to afford these different options. I myself would probably focus on a native messaging approach, as this makes most sense to me for my use-case. These implementations should be interchangeable within the extension though.

Regarding native messaging, the protocol is fairly simple (more information here) and I'm not sure it supports multiplexing. In general we shouldn't need to push too much data over this channel as it will just be for metadata. Actual data from dat can be fetched over http.

RangerMauve commented 6 years ago

With regards to your patterns:

but this can be a carrot to push them to switch to a local gateway

If you take a look at the PR I just did to dat-gateway, this will actually allow us to have read-write of a dat archive from within the browser, and then using the gateway to sync with the greater network.

The process would look like

Create dat in browser, save the key and contents to some sort of local storage (probably IndexedDB)
Work with the dat archive as usual
Open a replication stream to the gateway
Replication stream will first look for peers in the greater network and do a sync for 3 seconds, then it will start replicating with the browser over websockets. This means it will only download the latest changes over the socket
Gateway will be replicating its changes to whatever peers on the network are interested (this will work well with having apps that request hashbase to replicate their content as it will be able to get it from the gateway)

This way we have full CRUD with the caveat that the gateway will see more traffic form replication streams and the dat only being advertized while the browser is actively replicating or the gateway has it in its cache

Local gateway via native messaging. Could be installed by running a script which sets up the binary and manifest Firefox.

Would this really be better than having a dedicated app like dat-desktop?

I'm still not sure what native messaging entails here. Are you going to be replicating dats to the browser, or is this going to be an RPC API for the various dat archive commands? That seems like a lot more effort, but it will probably be less resource hungry for the browser

Actual data from dat can be fetched over http

So you'll need an active gateway in addition to whatever is installed for the native messaging bridge? WIll that be a public gateway, or will it also be set up when the native bridge is established?

sammacbeth commented 6 years ago

I'm not sure I follow what you propose. Does this solution require us to be able to require dat modules on the extension side? I'm not sure how easy it is to browserify these libraries, and if we have to run a node process anyway, we might as well offload all of the dat code to that. In general the node process should be doing all of the dat work, and the browser just requests content and presents it to the user.

With native messaging, the process launched by the browser is also the gateway server. See my prototype app code here, it runs an instance of DatGateway as well as listening for messages from the extension. The native message API actually can launch any application registered with the browser, so we could event launch dat-desktop if we wanted to.

RangerMauve commented 6 years ago

Does this solution require us to be able to require dat modules on the extension side?

Basically, yes. Hyperdrive and the modules it uses for storage are basically pure-js so bundling them shouldn't be a challenge. The benefit is that the browser can reuse a lot of the existing code in the JS ecosystem. As I mentioned earlier, getting the DatArchive API would just require creating a hyperdrive, and shoving it into node-dat-archive.

The benefit here, too, is that people could use this exact code without even needing an extension. So regular HTTP-served websites could have it included as a polyfill for when there's no DatArchive API being provided by the browser.

Having all the dat logic in a node process makes sense when you have a local gateway, but non-technical (or users that don't care for setting up node modules) users wouldn't bother doing so. (Or would not want to even try the extension if it was a requirement). If the extension offloads everything needed for the DatArchive API to a local node process, it means it won't work for casual users. If it uses something that purely requires the gateway (like a public gateway), then more people could be onboarded, and therefore more people will be using Dat and will add pressure for browsers to integrate with it in the long run.

I think that code is useless unless there are people to run it, so the easier we can make the onboarding experience, the more likely it is that Dat will gain mainstream adoption.

it runs an instance of DatGateway as well as listening for messages from the extension

That's awesome! Once I have my websocket changes merged into dat-gateway, the replication feature will come for free without needing anything else set up!

(edit: Sorry if I'm ranting a lot, I'm just really excited by this!)

rjcorwin commented 6 years ago

@RangerMauve @sammacbeth I'm loving this brainstorm guys. I had just opened up an issue in @pfrazee's dat-gateway issue queue under the same name and then I found this issue. https://github.com/pfrazee/dat-gateway/issues/5

I'm a bit unsure about making this api available over a public interface

I feel you @sammacbeth, there needs to be some kind of security. Perhaps an API for authorizing origins to other dats?

RangerMauve commented 6 years ago

Here's my progress on a DatArchive implementation that makes use of the websocket feature I added to dat-gateway: dat-archive-web

You can test it out by running my fork of dat-gateway and running npm run example

RangerMauve commented 6 years ago

I've fixed up issue with dat-archive-web. It's working fully now.

A dat can be created client-side, then replicated to the gateway using websockets. The client side has full ownership of the dat, and the gateway only exists to advertise it on the network (while its in the gateway's cache)

dat-fox could extend the DatArchiveWeb class to make .create persist the data to something like indexedDB and something to keep track of keys in the plugin to support stuff like DatArchive.selectArchive

rjcorwin commented 6 years ago

@RangerMauve Wow! This sounds great. Trying to get it rolling in an example but hitting a blocker. I've got your two repos cloned, installed, and the gateway running. Bundled the dat-archive-web repo per the docs and included that bundle in an example repo where it gets included by a very boring index.html.

Here's the example repo https://github.com/rjsteinert/dat-archive-web-example

Perhaps a broken bundle because of a conflicting node version? I'm running node v9.5.0.

RangerMauve commented 6 years ago

Hey, the repo only works in Node.js at the moment.

The issue you're seeing is due to the graceful-fs library being imported by node-dat-archive.

I got the build working by adding a browser.js file which defines the global variable and using this browserify command: browserify -r fs:graceful-fs -s DatArchiveWeb -e ./browser.js > bundle.js

I was working on the web example today as well, but I've encountered a problem with dat-dns not working in browsers.

I was going to work on it tomorrow and get rid of dat-dns support entirely until I can find a way to make it work (maybe a new gateway feature).

I've got an example with a working build (but not working DNS) in my gh-pages branch.

You might be able to use DatArchive.create(), though. I've got to stop for today, but I'll work on it more tomorrow.

RangerMauve commented 6 years ago

I've set up a public gateway at gateway.mauve.moe:3000 so you don't even need dat-gateway installed locally. It's the lowest tier Digital Ocean droplet so don't expect amazing performance. :P

RangerMauve commented 6 years ago

Getting it to work might be as simple as getting rid of this line which uses dat-dns and renaming the name argument to url

rjcorwin commented 6 years ago

Exciting stuff @RangerMauve. Glad to hear you are making a example site to play around in.

Here's what I'm seeing on my end.

RangerMauve commented 6 years ago

Yeah, that's a problem stemming from the lack of dat-dns support.

You could try the fix I proposed with getting rid of the dat-dns stuff entirely, but I think I'll have it running tomorrow either way.

RangerMauve commented 6 years ago

I've got it running here

You should wait a few seconds after creating an archive for it to sync up with the remote.

rjcorwin commented 6 years ago

Woohoo!

sammacbeth commented 6 years ago

Looks good! I cleaned up the repo a bit yesterday and am now starting to tackle the injection of the API into Dat pages and communication channel between the page, extension and gateway. Ideally, the calls to the gateway should be done from the background script context of the extension. This will prevent cross-origin rules and CSP from breaking the API.

Once the injection and messaging is working, I'll grab what you have on dat-archive-web and see if it will run in the extension.

rjcorwin commented 6 years ago

injection of the API into Dat pages

That would be rad! But in case that's tough, as an application developer I'd be fine with including a dat-archive.js polyfill that falls back to the gateway when window.DatArchive is not available.

RangerMauve commented 6 years ago

@rjsteinert already did a PR for adding CSP headers to dat-gateway so it wouldn't be as necessary.

You could inject a browserified bundle of dat-archive-web, and some JS that invokes DatArchive.setGateway() to set the gateway URL (if it isn't using localhost:3000).

On top of that you could inject something that will talk to the extension to support DatArchive.selectArchive and patch DatArchive.create() to have the extension save the private keys for later and somehow save the archive data.

Maybe the extension could have an always active set of DatArchive instances connected to the gateway which keep those archives in the cache.

RangerMauve commented 6 years ago

@rjsteinert would you have time to go on the dat gitter channel to talk about this stuff? I'm thinking about what sort of interface I should add to DatArchiveWeb to save credentials.

sammacbeth commented 6 years ago

Injection seems to work fine using the method IPFS Companion use: the content-script injects a script into the page which adds the API to the window object, then opens a communication channel to the extension background script. I have this working on my branch.

In this prototype the resolveName message is received by the background and can then be invoked in that context, or if native messaging is being used, can ask the node process to serve the request. In the latter case the extension is just a thin client which passes all Dat tasks to the gateway process. As there is only a little boilerplate required to wire these things I think I can implement the API very quickly now using this pattern and node-dat-archive on the gateway.

RangerMauve commented 6 years ago

I'm working on refactoring dat-archive-web so that you can essentially "plug in" how it works.

My goal was to make something with post-message-frame to talk to an iframe that will manage everything, but I think this will make it easier for you to plug in the communication to the extension, too.

I want to address the following concerns:

DatDNS
Managing private keys for the selectArchive API
De-duplication of data between tabs

I think that the gateway can be used for dat-dns by sending a request to http://localhost:3000/mydatthing.com/.well_known/dat since that is basically what's happening when we send a request to the domain itself and the gateway has CORS headers enabled

With regards to de-duplication and private key management, I think dat-archive-web should have a complementary "service" that it talks to which can be pluggable based on the environment (gateway, extension, bunsen browser)

The service would communicate via streams to do the following:

Save all dat information to it's local storage (indexedDB in the browser)
Keep track of which origins have write access to which Dats
Have a stream based protocol for requesting access to a dat (for selectArchive)
Have a stream based wrapper over the dat storage implementing the random-access-storage API to use in the hyperdrive

This should make the service fairly simple and can leave the client to do the following:

Configure selectArchive to talk to the service
Configure it's storage for the hyperdrive to use the service storage
Configure resolveName to talk to the gateway
Configure how to create a stream to replicate with the gateway

This should put a lot of the heavy lifting into existing modules like dat-archive-web and random-access-storage. That way the actual services only need to care about the storage and authorization.

This will also allow me to work on a service that works from the dat-gateway itself within an iframe.

I'll communicate with it using post-message-stream
Store all data in indexedDB
Incject the configured dat-archive-web API into HTML pages from the gateway (if it's enabled)

That way people could have all the benefits (except privacy :P) of using beaker without having to install anything, but with a path to have full class support via extensions and specialized browsers.

rjcorwin commented 6 years ago

@RangerMauve @sammacbeth @chrisekelley The four of us have been hitting this problem pretty hard in the past week. Would you all be available for a WebRTC chat on talky.io (https://talky.io/dat-gateway) to sync up tomorrow Tuesday April 17 at 2pm ET?

RangerMauve commented 6 years ago

I'm busy weekdays from 9:00 EST to about 18:00 EST, so I don't think I could. Weekends work best for me, to be honest. (Always available for texts, though :P )

rjcorwin commented 6 years ago

@RangerMauve @sammacbeth Unfortunately 18:00 EST is too late for @chrisekelley (he lives in Barcelona), but Chris has given me the ok to try and connect with you two and I'll do my best to relay any plans.

@sammacbeth Is 18:00 EST a good time for you today?

rjcorwin commented 6 years ago

I see that may be even later for @sammacbeth as he is listed being in Munich Germany. I suppose @chrisekelley / @sammacbeth / myself could meet at 12:00 ET while @RangerMauve / myself meet at 18:00 ET.

Shall we try those times tomorrow, Wednesday April 18?

RangerMauve commented 6 years ago

12:00 ET could work for me, actually. I can do it on my lunch break.

sammacbeth commented 6 years ago

12:00 ET should work for me tomorrow.

rjcorwin commented 6 years ago

Great! @chrisekelley is in for tomorrow at 12:00 ET as well. See y'all at https://talky.io/dat-gateway

RangerMauve commented 6 years ago

I refactored dat-archive-web to be more extendable.

It just needs a way to plug in storage and a replication feed using what I'm calling a manager

I've updated my demo with a manager that uses persistent storage in IndexedDB and my public gateway. Live demo

This should make it easy to implement services that have their own opinions on how to store the data and replicate it as well as stuff like selectArchive and DatDNS.

sammacbeth commented 6 years ago

So I managed to convert the timezone incorrectly, so will already have to leave at 12:10 ET... If anyone can make an earlier start, that might be better.

RangerMauve commented 6 years ago

I can start earlier if you want.

rjcorwin commented 6 years ago

Great chat! Still processing everything I learned. One take away was it sounded like @sammacbeth and @RangerMauve might be taking different directions in the DatArchive approach? Even if different perhaps it could be the same dat-gateway codebase? To get us going on the Bunsen front, we baked in RangerMauve's fork of dat-gateway so we could give some of the demos he's been working a try in Bunsen.

soyuka commented 6 years ago

Sorry to jump in like that :). I wasn't aware of the hipe arround dat-fox immediately :D.

I really like the native messaging approach as it'll be totally transparent for the end user. Just install an extension and you're good to go. No gateway to configure (public) or to install (local). This native script could even be packages so that it could run without the need of having nodejs pre-installed (ease maintenance, versions mismatch etc.). AFAIR, the only thing preventing hyperdrive from working in the browser is that it's random-access-storage needs access to the file system.

In general we shouldn't need to push too much data over this channel as it will just be for metadata. Actual data from dat can be fetched over http.

You mean fetched over http through a gateway? This means the native script would in fact be the gateway?

Wouldn't it be easier if you had a random-access-storage in the browser that would use the native script to write/read? (I'm currently experimenting this approach)

might be taking different directions in the DatArchive approach? Even if different perhaps it could be the same dat-gateway codebase?

This is what I read here as well but that's a good thing. IMHO, we're all experimenting a lot with dat and it's usage and our goal is to help people with no computer knowledge to actually get access to the network.

About the gateway there are a few approaches there as well. On my early work with a websocket daemon I was also able to run dat in the browser only by using RPC calls to get acces to the file system. To me, the native messaging system enables even more ways to reach the same goal!

rjcorwin commented 6 years ago

To clarify on the different directions, my probably over simplified summary is it sounded like @RangerMauve is thinking of taking a "thick client" approach where the browser does most of the work and communicates with either a local or public gateway where @sammacbeth is taking more of a "thin client" approach where the gateway will do most of the operations and be very important to be locally hosted and controlled.

Does that sound about right? Apologies if I missed the mark.

RangerMauve commented 6 years ago

@soyuka The approach with having random-access-storage and hyperdrive in the browser is what I'm taking. I modified dat-gateway to have a websocket server which would create hyperdrive replication streams. This lets me sync a hyperdrive with the network by piping its replication into the WS without having to interact with the discovery swarm (the gateway does that instead).

The dat-fox extension is still going to need a gateway installed locally, it's just that the gateway will be doing more work and can have tighter integration with the OS than a browser could.

@rjsteinert I think your assessment is correct. :D

soyuka commented 6 years ago

@soyuka The approach with having random-access-storage and hyperdrive in the browser is what I'm taking.

I read your code and figured that out yes :).

RangerMauve commented 6 years ago

I'm thinking it would be good to start a repo that's just for discussing DatArchive/Beaker stuff in the context of other browsers to have it all in one place (and so we don't spam dat-fox too much :P ). Does that sound appealing?

soyuka commented 6 years ago

I'm thinking it would be good to start a repo that's just for discussing DatArchive/Beaker stuff in the context of other browsers to have it all in one place (and so we don't spam dat-fox too much :P ). Does that sound appealing?

We should close this in favor of https://github.com/datproject/discussions/issues/84 this repository has that purpose :).

RangerMauve commented 6 years ago

Yeah, I guess dat-fox provides DatArchive as of the recent PR that @sammacbeth did.

sammacbeth / dat-fox

DatArchive API #1