ipfs / notes

IPFS Collaborative Notebook for Research
MIT License
402 stars 31 forks source link

IPFS API via Unix sockets #129

Open Kubuxu opened 8 years ago

Kubuxu commented 8 years ago

Unix socket have many benefits over TCP sockets. Lower CPU time and latency overhead, file system localized locations which allows for system based permissions and access control.

Also Unix sockets would be a able to use different encoding scheme from TCP (HTTP) which would be suitable for different type of applications (not browsers). This encoding should focus on being fast to encode and decode, simple to implement and also "size doesn't matter" as it is local communication.

Things that need specifying:

My proposal for encoding is bencode. It is very simple to implement (under 300 C LOC) quite fast encoding. At first glance it isn't human readable but as it isn't binary encoding someone knowing the rules is able to read. It isn't space efficient but this isn't that much of a problem here.

With usage of Unix sockets in STREAM mode it should be quite easy to write protocol that is both simple and efficient. I would go with something similar to cjdns's admin API RPC model. Calls and responses are maps.

Why: We need higher performance API for example to be able to extract ipfs FUSE mount into separate process. It solves many problems but the HTTP api would be major overhead.

If anyone has some ideas, comments, please voice them. If you like the idea in current state please show it by voting :+1:.

I would love to hear opinions also from JS part of a project.

Kubuxu commented 8 years ago

This will be more complex as we need multipart type channels but can be solved. I will work some more on the spec.

hackergrrl commented 8 years ago

Heya! This is a neat idea, and I'm interested in hearing more: adding another API is a lot of work, so it'd be great to really flesh out the "Why" section to help us understand your thinking as clearly as possible. Here are some Qs from your current write-up:

Why: We need higher performance API for example to be able to extract ipfs FUSE mount into separate process. It solves many problems but the HTTP api would be mayor overhead.

  1. What does higher performance mean, quantified?
  2. Why do unix sockets make extracting IPFS FUSE into a separate process easier? Why is this desirable?
  3. What are the "many problems" that it solves?
  4. Can you quantify how much overhead "major overhead" is?
Kubuxu commented 8 years ago
  1. I can't find any quantitative data and simple benchmarks are meaningless (as cost of TCP lies in big part at the start of the connection, for stabilized connections I get about 900MiB/s vs 750MiB/s on my machine but it might depend on many factors) in most case here is an explanation from BSD list: http://lists.freebsd.org/pipermail/freebsd-performance/2005-February/001143.html Also TCP uses ramp up throttling what means that it starts slow and then increases speed as it sees that packets are not being dropped. What should be also compared is that new protocol would allow for asynchronous calls wouldn't depend on keep-alive for reduced command latency.
  2. We are currently afraid that HTTP the performance of FUSE will be even lower than it is now. I don't know how well (and if at all) go is able to perform for example HTTP API calls with keep-alive.
  3. FUSE currently is part of go-ipfs that: doesn't work on all systems requiring conditional builds, is unstable (as most things using FUSE) causing whole daemon to crash, can create zombie processes locking up resources.
  4. Major overhead is few milliseconds TCP and HTTP need for hand shaking, negotiations and so on. In case of few 2KiB size file (which is already cached in RAM) and full local transfer speed of 500MiB/s those few milliseconds (let's say 3) would reduce transfer speed to 280MiB/s and that is not including TCP ramp up.

Other neat feature of UNIX Socket based API would be lack of port binding conflicts among different users and also possibility of file system level access control.

I understand that API redesign isn't small task, but there is need for that. Current API was written with CLI in mind and speced out bottom-up (API first, specs later). It doesn't fit either Remote Procedure Call model nor the Resource based model (RESTful) but those two models are most commonly used and easiest understood.

This created API that work but isn't great to use for other perspectives than the CLI applications. If we were to do full API redesign I would go with two levers of the API:

The HTTP API could be even built entirely on the lower level API which corresponds to something I talked with @lgierth - extracting and restructuring the HTTP Gateway.

The important part would be that we can now apply top-down approach and first design APIs with use cases in mind and then implement them as we see beast. This would give outcome of much better structured, uniform and easier to implement (as in other language bindings) interface for accessing the IPFS world.

sorry for the wall of text

hackergrrl commented 8 years ago

Thanks for the additional info. Is there a specific motivating problem that this aims to solve (like, an issue or someone with a case where unix sockets are a very explicitly clear win)?

Kubuxu commented 8 years ago

Yes, current API is aimed at high level applications (not really but it doesn't make any difference in that case) which makes interfacing lower level applications with IPFS really hard.

To communicate with IPFS daemon a C application would have to use libcurl (or similar) which is already quite complex, but also you have to parse JSON or XML which requires separate library on its own.

Also IPFS due to its CLI based API doesn't provide constructs that are known in low level world. There is no socket you can just read data off, no simple way to seek in a binary file stored in IPFS. Of course c-ipfs-api with API in current state could happen but every library is big quite a responsibility in world of C, and that bindings wouldn't suit that would

Unix sockets are clear win in case of multi user systems, but isn't about just the transport but mostly encoding, protocol and possibly the API itself.

hackergrrl commented 8 years ago

Points all taken and understood. I'm still not sure we're on the same page, so let me try to rephrase: "is there a specific person or project or effort that is blocked or hindered by the lack of this?"

If so, maybe it makes more sense to start the discussion from a place of "how do we solve problem X" rather than "how do we implement Y"? (Maybe this discussion/context already happened on IRC or elsewhere on GH and I missed it?)

Kubuxu commented 8 years ago

I would really like to extract FUSE out of the core go-ipfs (and maybe start a trend). This task requires very specific API to keep everything up to performance (and possibly increasing it). Issues: https://github.com/ipfs/go-ipfs/issues/2712 https://github.com/ipfs/go-ipfs/issues/2166 and more. There is no separate issue for extracting FUSE as I think most of the talk about it happened over IRC.

hackergrrl commented 8 years ago

Awesome! Yes: getting FUSE out of core sounds really nice. :)

What do you think about getting something working first (a proof of concept) using e.g. the existing HTTP API? Or heck, maybe even HTTP over unix sockets? (you can ignore the cost of TCP connection management in this case, but still reuse all of the API that exists today) You've made it clear that unix sockets would be faster, but the easiest win here sounds like just the separation step.

whyrusleeping commented 8 years ago

relevant go-ipfs issue: https://github.com/ipfs/go-ipfs/issues/2148

kevina commented 8 years ago

I agree with @noffle in that we should first try getting something working using the HTTP API. With proper caching I don't think the performance will suck as badly as some fear. Once we have something basic working we can consider optimizing it with a better API. This will also allow us to perform benchmarking and really see how much of an impact the API has.

I have some experience writing a fuse filesystem in C++ and should be able to figure out how to write one in Go. This is something I might be willing to take on if no one else does.

whyrusleeping commented 8 years ago

the fuse code is already written (and works, for the most part under normal circumstances) in the go-ipfs codebase, the only thing we would have to do is move the way it accesses data from being directly connected to a core.IpfsNode towards using the http api. This would be really awesome to have.

whyrusleeping commented 8 years ago

Actually, a really awesome way to do this easily would be to tweak the mfs code to use either the http api or the core node. Improving this interface: https://github.com/whyrusleeping/fallback-ipfs-shell/blob/master/shell.go (and surrounding codebase, that repo is really sad) would be the right way to go.

The advantage of making that change in mfs is that we don't have to make many changes to the fuse code (it primarily uses mfs) to get things working, and any improvements to mfs affect the rest of the system too (ipfs add uses mfs under the hood)

kevincox commented 4 years ago

Why do we want a separate protocol for the socket API? I think that it will just make it much harder to implement clients and will mean that most tools won't support it. I think it would be better to use the HTTP protocol over a unix socket. IIUC this will give basically identical performance.

I think a HTTP/2 (unencrypted) API bound to a unix socket would be very performant. This also means that we don't have to duplicate efforts. For example changing the response encoding from JSON to something more efficient could be used by both UNIX and TCP+HTTP clients.

That being said the reason I would like to see this is for the access control. For example I have a multi-tenant system and don't want to expose IPFS to everyone. If it could be bound to a unix socket I can adjust the socket permissions so that only certain users can connect (for example a reverse proxy which can do arbitrarily complex authentication).

Furthermore in a mutli-tennant system you risk another process binding the port since unless I am running IPFS as root I need to pick an unprivileged port. This means that I can't trust that I am actually talking to the IPFS API.

Another related benefit is avoiding port collisions as previously mentioned.

hsanjuan commented 4 years ago

I think that it will just make it much harder to implement clients and will mean that most tools won't support it. I think it would be better to use the HTTP protocol over a unix socket

AFAIK you can already configure the normal HTTP API with a unix-socket listener and things work as you expect.

kevincox commented 4 years ago

AFAIK you can already configure the normal HTTP API with a unix-socket listener and things work as you expect.

Are there any docs for this I looked at everything I could find and the multiaddr docs and couldn't find anything.

hsanjuan commented 4 years ago

It's documented here: https://github.com/ipfs/go-ipfs/blob/master/docs/config.md#addressesapi

Stebalien commented 4 years ago

*in go-ipfs master.

Note 1: this issue is probably poorly titled. The main goal is to have a more efficient RPC protocol. Note 2: We can do things with unix sockets that we can't do with, e.g., HTTP2. For example, we can share memory.