ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
15.83k stars 2.96k forks source link

[http_proxy_over_p2p] #5526

Closed cboddy closed 5 years ago

cboddy commented 5 years ago

This implements an http-proxy over p2p-streams, for context see https://github.com/ipfs/go-ipfs/issues/5341.

This script is a useful test of the functionality. In case it causes portability issues I've not included it as a sharness test (since it uses python to serve HTTP content, although happy to add it since python be ~ as available as bash).

(inline since GH doesn't support *.sh as an attachment)

#!/bin/bash                                                                                                                                                   

#                                                                                                                                                             
# clean up all the things started in this script                                                                                                              
#                                                                                                                                                             
function teardown() {                                                                                                                                         
    jobs -p | xargs kill -9 ;                                                                                                                                 
}                                                                                                                                                             
trap teardown INT EXIT                                                                                                                                        

#                                                                                                                                                             
# serve the thing over HTTP                                                                                                                                   
#                                                                                                                                                             
SERVE_PATH=$(mktemp -d)                                                                                                                                       
echo "YOU ARE THE CHAMPION MY FRIEND" > $SERVE_PATH/index.txt                                                                                                 
cd $SERVE_PATH                                                                                                                                                
# serve this on port 8000
python -m SimpleHTTPServer 8000 &

cd -

IPFS=cmd/ipfs/ipfs

PATH1=$(mktemp -d)
PATH2=$(mktemp -d)

RECEIVER_LOG=$PATH1/log.log
SENDER_LOG=$PATH2/log.log

export IPFS_PATH=$PATH1

#
# start RECEIVER IPFS daemon
#
$IPFS init >> $RECEIVER_LOG 2>&1
$IPFS config --json Experimental.Libp2pStreamMounting true >> $RECEIVER_LOG 2>&1
$IPFS config --json Addresses.API "\"/ip4/127.0.0.1/tcp/6001\"" >> $RECEIVER_LOG 2>&1
$IPFS config --json Addresses.Gateway "\"/ip4/127.0.0.1/tcp/8081\"" >> $RECEIVER_LOG 2>&1
$IPFS config --json Addresses.Swarm "[\"/ip4/0.0.0.0/tcp/7001\", \"/ip6/::/tcp/7001\"]" >> $RECEIVER_LOG 2>&1
$IPFS daemon >> $RECEIVER_LOG 2>&1 &
# wait for daemon to start.. maybe?
# ipfs id returns empty string if we don't wait here..
sleep 5

#
# start a p2p listener on RECIVER to the HTTP server with our content
#
$IPFS p2p listen /x/test /ip4/127.0.0.1/tcp/8000 >> $RECEIVER_LOG 2>&1
FIRST_ID=$($IPFS id -f "<id>")

export IPFS_PATH=$PATH2
$IPFS init >> $SENDER_LOG 2>&1
$IPFS config --json Experimental.Libp2pStreamMounting true >> $SENDER_LOG 2>&1
$IPFS daemon >> $SENDER_LOG 2>&1 &
# wait for daemon to start.. maybe?
sleep 5

# send a http request to SENDER via proxy to RECIEVER that will proxy to web-server

echo "******************"
echo proxy response
echo "******************"
curl http://localhost:5001/proxy/http/$FIRST_ID/test/index.txt

echo "******************"
echo link http://localhost:5001/proxy/http/$FIRST_ID/test/index.txt
echo "******************"
echo "RECEIVER IPFS LOG " $RECEIVER_LOG
echo "******************"
cat $RECEIVER_LOG

echo "******************"
echo "SENDER IPFS LOG " $SENDER_LOG
echo "******************"
cat $SENDER_LOG
cboddy commented 5 years ago

FYI this link appears to be broken https://github.com/ipfs/community/blob/master/contributing.md#contribution-guidelines

magik6k commented 5 years ago

Thanks! You should be able to add a sharness test which hijacks the request the request using netcat as is done in https://github.com/ipfs/go-ipfs/blob/master/test/sharness/t0235-cli-request.sh#L14.

cboddy commented 5 years ago

@magik6k thanks for the swift feedback; have started and will munge that script into an sharness test as suggested.

Stebalien commented 5 years ago

So, I believe this should actually be using the ReverseProxy helper. That will copy and modify all the right headers and handle all the edge-cases. However, that may just make it more complicated so take this as a suggestion. (You won't be able to use the NewSingleHostReverseProxy helper, you'll have to implement a custom RoundTripper.)

I believe we also need to read while writing. That is, the current version expects to write the entire request before it reads any response. We should start the request and then read the response as some worker finishes writing the request.

Note: We can probably fix these later if they turn out to be difficult to get right.

lanzafame commented 5 years ago

@cboddy @Stebalien this might be a little late in the PR cycle to suggest but, if I understand correctly, @hsanjuan has built go-libp2p-http for this use case. It should simplify the implementation.

hsanjuan commented 5 years ago

Hello, yes, https://github.com/hsanjuan/go-libp2p-http will probably simplify this (along with httputil.ReverseProxy). go-libp2p-http provides the libp2p transport (RoundTripper implementation) that you are looking for the ReverseProxy. If I'm correct you just need to initialize it with libp2p://peerid as target URL and manually set the transport. At least, it would not add another http RoundTripper to the libp2p world.

The RoundTrip in go-libp2p-http is simplified if you compare it to all the default transport's implementation (https://golang.org/src/net/http/transport.go?s=3628:10127#L385) but it's well tested and I think it should be solid enough (used by cluster for a while). It supports reading responses while sending the requests (for streaming requests).

The only problem would be to attach a custom transport tag to every Roundtripper. We can easily enable this upstream (currently a single tag is used for all), if you are willing to go down this path.

cboddy commented 5 years ago

@Stebalien @lanzafame thanks for the feedback.

Will look into using ReverseProxy as suggested (most of the go stdlib still a bit new to me) and try to push something today/ in the next few days.

hsanjuan commented 5 years ago

fyi, protocol tags per Roundtripper coming about: https://github.com/hsanjuan/go-libp2p-http/pull/12

Stebalien commented 5 years ago

One more thing we'll need to do before merging this: put it behind an experimental feature-gate. Take a look at how the ipfs p2p command does this (using config.Experimental.Libp2pStreamMounting).

I don't really see this needing a long stabilization period, but we should probably leave it experimental at last until we stabilize the ipfs p2p feature...

ianopolous commented 5 years ago

I've tested out all our api calls for Peergos through this, which includes gets, posts, and multiparts. And everything works!

peergos-over-p2p-stream

ianopolous commented 5 years ago

@hsanjuan We can't use the RoundTripper in go-libp2p-http because it assumes a different incoming request format, and we don't want to do an extra socket+http round trip. We used your technique to close the stream though.

ianopolous commented 5 years ago

@Stebalien Does that mean we need a PR to go-ipfs-config to add a new config option and then gx publish that before this can be merged, or can we use the existing Libp2pStreamMounting option?

hsanjuan commented 5 years ago

@hsanjuan We can't use the RoundTripper in go-libp2p-http because it assumes a different incoming request format, and we don't want to do an extra socket+http round trip. We used your technique to close the stream though.

That's not true. You just need to rewrite the path, which you are already doing anyway.

hsanjuan commented 5 years ago

I can plug in libp2p-http for you if that's the only problem.

ianopolous commented 5 years ago

@hsanjuan If we used go-libp2p-http would that not imply an extra http round trip? Here we receive the incoming request and directly write it to the p2p stream. But my understanding (correct me if I'm wrong - I'm not a Gopher) is that if we rewrote it to use go-libp2p-http we would then be passing through another socket and http parser before getting to the p2p stream?

hsanjuan commented 5 years ago

@hsanjuan If we used go-libp2p-http would that not imply an extra http round trip? Here we receive the incoming request and directly write it to the p2p stream. But my understanding (correct me if I'm wrong - I'm not a Gopher) is that if we rewrote it to use go-libp2p-http we would then be passing through another socket and http parser before getting to the p2p stream?

@ianopolous no, what go-libp2p-http does is exactly what you are doing here. The only difference is that it works with a standard go http client by expecting urls in a way that the standard http library can handle so that you can use a the normal Go's http.Client with both http and http-under-libp2p.

Stebalien commented 5 years ago

@Stebalien Does that mean we need a PR to go-ipfs-config to add a new config option and then gx publish that before this can be merged, or can we use the existing Libp2pStreamMounting option?

Let's add a new experiment, it's a pretty orthogonal feature (only using stream mounting on the client). I'll handle the gx update (yeah, I know it's a bit of a pain).

Stebalien commented 5 years ago

@hsanjuan mind making a PR against this PR? That'll make it easier to see what the change entails.

hsanjuan commented 5 years ago

@Stebalien here: https://github.com/cboddy/go-ipfs/pull/4/files

hsanjuan commented 5 years ago

By the way, let's not forget to document this. A section in the https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features.md doc. I don't know where it should be documented once it's not experimental though (here probably: https://docs.ipfs.io/reference/api/http/ ?).

ianopolous commented 5 years ago

This is waiting for a gx bubble up from https://github.com/ipfs/go-ipfs-config now.

ianopolous commented 5 years ago

@hsanjuan Do you know if this build issue is a gx import issue or something else? (It builds fine locally) https://ci.ipfs.team/blue/organizations/jenkins/IPFS%2Fgo-ipfs/detail/PR-5526/3/pipeline/

core/corehttp/proxy.go:36:43: cannot use ipfsNode.P2P.PeerHost (type "gx/ipfs/QmQKod7iLxQK6X3aFYvhDDnFdXo3QjxKeL2F7UrPtdKQR2/go-libp2p-host".Host) as type "gx/ipfs/QmeA5hsqgLryvkeyqeQdvGDqurLkYi3XEPLZP3pzuBJXh2/go-libp2p-host".Host in argument to p2phttp.NewTransport:

Kubuxu commented 5 years ago

Check out: https://ci.ipfs.team/blue/organizations/jenkins/IPFS%2Fgo-ipfs/detail/PR-5526/7/pipeline/16

You have some duplicate gx dependencies imported which would explain the problem you are having.

hsanjuan commented 5 years ago

@ianopolous you'll need to rebase this to go-ipfs/master and then gx update go-libp2p-http github.com/hsanjuan/go-libp2p-http .

cboddy commented 5 years ago

rebased this onto master..

ianopolous commented 5 years ago

@hsanjuan I've tried running "gx update go-libp2p-http github.com/hsanjuan/go-libp2p-http" after rebasing, but get: updating go-libp2p-http to version 1.1.4 (Qmcb1uGjGVgU6riTwg9HvaSq3TtpDdF95Md3TogbcTRyag) ERROR: update sanity check: package client_golang (QmYYv3QFnfQbiwmi1tpkgKF8o4xFnZoBrvpupTiGJwL9nH) not found

hsanjuan commented 5 years ago

maybe you needed to do gx install --global first? @ianopolous

ianopolous commented 5 years ago

@Stebalien How do we fix the dupes: https://ci.ipfs.team/blue/organizations/jenkins/IPFS%2Fgo-ipfs/detail/PR-5526/14/pipeline Is it just that the latest versions of things need to be gx bubbled up through all the things that use them?

hsanjuan commented 5 years ago

@ianopolous I think you need to rebase in master and re-update go-libp2p-http to the latest (which uses libp2p 6.0.19 in line with the rest of go-ipfs/master)

ianopolous commented 5 years ago

Ok I've rebased and gx updated for the third time. Hopefully we can merge this soon.

ianopolous commented 5 years ago

@Kubuxu or anyone: Any chance you could give me a hand fixing gx et al to get this merge-able? Many thanks.

hsanjuan commented 5 years ago

@ianopolous If still doesn't work, if you can give me write access to cboddy I can try to fix the branch.

Stebalien commented 5 years ago

Note: Don't bother with rebasing when you're done, I'll handle that (I'm about to pull though a massive gx update...).

cboddy commented 5 years ago

Thanks @Stebalien @hsanjuan , yes I noticed last night that the multi-part test seems to fail at a rate of about 1 in 10 attempts.

So far it appears that when it fails it is because the nc server receives a truncated request (everything after the first mutipart-boundary is missing). Not 100% sure what is going on yet; looking into it now.

hsanjuan commented 5 years ago

I need to bubble libp2p-6.0.20 to go-libp2p-http. Will do this tomorrow.

Stebalien commented 5 years ago

So far it appears that when it fails it is because the nc server receives a truncated request (everything after the first mutipart-boundary is missing). Not 100% sure what is going on yet; looking into it now.

Oh. that issue. Yeah, we hit that in our other tests as well. I believe we just need to implement a custom tool for testing HTTP things in go.

Stebalien commented 5 years ago

I have a PR here to cleanup the tests a bit: https://github.com/cboddy/go-ipfs/pull/5. This uses IPTB and ensures configures the test nodes to not connect to the rest of the network (i.e., don't enable discovery, don't bootstrap).

Stebalien commented 5 years ago

So, we have a bit of an issue. Go's ServeMux automatically "cleans up" URLs so it'll helpfully:

  1. Decode the slashes.
  2. Strip the duplicates.
  3. REDIRECT the user.

Even if we choose a different escaping scheme for protocols, go will still end up rewriting the rest of the URL for us and there's no way to avoid that (as far as I can tell).

Options:

  1. Inject some logic before the mux. That is, we can have a mux-mux that skips the normal mux for proxied requests.
  2. Put the path in the query string. That is, /http/proxy?peer=xyz&proto=/libp2p-http&path=/a/b/c. This breaks browsing web pages but, really, we're using this to proxy requests.
  3. Use a different port.
  4. Accept the path normalization. Probably a bad idea from a security standpoint.
ianopolous commented 5 years ago

I'm pretty keen not to break browsing a website. That is our second most important use case after a single request. I also hope to use it personally to host arbitrary websites without requiring dns or indeed a public server.

@Stebalien Can you give some concrete examples where the encoding is a problem? I'd still fallback to restricting the character set for protocol names, seems like you're set on allowing "/"'s so maybe ban "-"'s (or some other non slash chartacter) and then we can encode them that way?

ianopolous commented 5 years ago

Hmm it's worse than I thought, if I type the following into a browser it undecodes, before even sending the request! So this: localhost:5001/proxy/http/QmaCZqbmBP5u2435pmBT8x5Ks6ZiWNg66ET4ctZNKmw78K/%2fx%2ftest/ becomes localhost:5001/proxy/http/QmaCZqbmBP5u2435pmBT8x5Ks6ZiWNg66ET4ctZNKmw78K/x/test/

which clearly doesn't work for protocols with "/"'s in them

cboddy commented 5 years ago

@Stebalien thanks for refactoring the tests. Which tag of iptb are you using/is expected? The flags used don't seem compatible with iptb's HEAD.

ianopolous commented 5 years ago

@Stebalien I've reverted the url decoding because it isn't going to work. Can we merge this and move the protocol name discussion to another pr?

Stebalien commented 5 years ago

Which tag of iptb are you using/is expected?

See: https://github.com/ipfs/go-ipfs/issues/5521. I'm using 1.3.19

I've reverted the url decoding because it isn't going to work. Can we merge this and move the protocol name discussion to another pr?

It's worse than that, unfortunately, as we'll also redirect/rewrite the http path. We're in a release freeze at the moment so we can think this over for a few days. I'm find merging something that gets us most of the way there but I prefer not merging API changes that we intend to break in the immediate future.

ianopolous commented 5 years ago

@Stebalien How long until the next release? I was super hoping to make this one. :-(

I'm find merging something that gets us most of the way there but I prefer not merging API changes that we intend to break in the immediate future.

This version only works for protocol names without /'s. Unless we remove the protocol from the path entirely, then this shouldn't change for such protocol names. All we would be doing is expanding support to include more protocol names.

Stebalien commented 5 years ago

So, thinking about hosting websites from this API for a minute, I realized something rather important: We can't expose this from the API for security reasons (XSS).

That means we can either:

  1. Expose it from a new port.
  2. Expose it from the gateway.

Personally, I think exposing it on the gateway will be the easiest.


WRT encoding, names, etc. I'd like to propose a solution that I think will make everyone happy:

/p2p/$ID[/x/$PROTOCOL]/(http|ws)[$PATH]

Examples:

Rational:

  1. It's almost a multiaddr (I wish we could embed paths in multiaddrs).
  2. It makes the sequence of operations very clear (connect to node Qm..., speak protocol X, talk HTTP, etc.).
  3. With a browser addon, we can map these onto libp2p://Qm... and libp2p://$PROTOCOL.Qm.... to preserve origins. Really, slashes are always going to be an issue for us if we want to support cases like this.
  4. We don't really need to support arbitrary libp2p protocols.

@cboddy @hsanjuan ~@hacdias~ (sorry) @ianopolous opinions?


Regardless of how it may seem, I really don't want to draw this out indefinitely. I'm fine merging this as-is (on a custom port or as a part of the gateway, behind a feature flag) as long as we agree to revisit this and work under the understanding that the interface is unstable and will likely change.

ianopolous commented 5 years ago

@Stebalien Thanks for your feedback.

I was planning to expose it on the gateway as well, in a separate pr. So moving it there is fine by me.

I like the proposed path redesign. We don't care about the specific protocol name and will only ever use one so having the option to omit it is nice. To be clear though, doesn't this imply you are banning using /http in the (explicit) protocol name, as you are using that to determine the end of the name in the path?

hsanjuan commented 5 years ago

WRT encoding, names, etc. I'd like to propose a solution that I think will make everyone happy:

/p2p/$ID[/x/$PROTOCOL]/(http|ws)[$PATH]

This means we need an additional libp2p-ws for websocket proxying?

@hsanjuan any chance we can change these to /http and /ws? That is, make cluster also listen on /http?).

That should be no problem.

LGTM. Good catch with the XSS, it was rather obvious and we almost missed it :/

Stebalien commented 5 years ago

To be clear though, doesn't this imply you are banning using /http in the (explicit) protocol name, as you are using that to determine the end of the name in the path?

Actually, I'm banning slashes entirely. I wonder if that's a bit confusing... Basically, I'm saying that protocols must be of the form /x/CUSTOM/http or just /http. The idea is that custom applications would listen on /x/my-app/http (or /x/my-app/ws).

Eventually I expect this'll be done through a libp2p daemon and applications will (a) register sub-peer and then (b) listen on /ws or /http. But we're not there yet (except for custom libp2p apps like cluster).

This means we need an additional libp2p-ws for websocket proxying?

Someday.

hsanjuan commented 5 years ago

Actually, I'm banning slashes entirely

It's fine. This is just for applications that want to be accessible by default using a default reverse proxy exposed in the gateway endpoint of the go-ipfs daemon. It's a very specific usecase for which an extra protocol handler that satisfies the limitations can be just added easily. You can always manually mount these streams with whatever custom protocol in the worst case (now and in a future libp2p daemon).

ianopolous commented 5 years ago

@Stebalien I've made those changes to the path spec, and moved the end point to the gateway.