ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
15.83k stars 2.96k forks source link

namesys/pubsub: pubsub Publisher and Resolver #4047

Closed vyzo closed 6 years ago

vyzo commented 6 years ago

Implements a namesys publisher and resolver with pubsub.

whyrusleeping commented 6 years ago

@vyzo I think the flakiness is probably caused by using the routing system to bootstrap in the tests. I think (at least in this single go pubsub test) we should manually connect the hosts, something like:

        pub := NewPubsubPublisher(ctx, pubhost, ds.NewMapDatastore(), pubmr, floodsub.NewFloodSub(ctx, pubhost))
        privk := pubhost.Peerstore().PrivKey(pubhost.ID())
+       pubpinfo := pstore.PeerInfo{ID: pubhost.ID(), Addrs: pubhost.Addrs()}

        name := "/ipns/" + pubhost.ID().Pretty()

@@ -109,18 +111,22 @@ func TestPubsubPublishResolve(t *testing.T) {
        res := make([]Resolver, len(reshosts))
        for i := 0; i < len(res); i++ {
                res[i] = NewPubsubResolver(ctx, reshosts[i], resmrs[i], ks, floodsub.NewFloodSub(ctx, reshosts[i]))
+               if err := reshosts[i].Connect(ctx, pubpinfo); err != nil {
+                       t.Fatal(err)
+               }
        }

We can test the bootstrap mechanism explicitly in a different test.

vyzo commented 6 years ago

re: bootstrap I agree that the flakiness is probably caused because of bootstrap issues. I think we want some delay right after opening a connection in bootstrapPubsub, so that pubsub has time to do its handshake.

vyzo commented 6 years ago

did a rebase to update and resolve conflicts and force pushed on top, so commits are out of order with review comments.

Stebalien commented 6 years ago

One solution is to change the validity field of IPNS records to state where they can be published.

Actually, another solution is to wrap records in a second (signed) pubsub specific record. That avoids breaking compatibility but is a bit wasteful.

Stebalien commented 6 years ago

One solution is to change the validity field of IPNS records to state where they can be published.

Actually, another solution is to wrap records in a second (signed) pubsub specific record. That avoids breaking compatibility but is a bit wasteful.

So, @whyrusleeping actually prefers this method (this is actually how the DHT prevents users from publishing other user's IPNS records to the DHT).

vyzo commented 6 years ago

If a publisher does not have pubsub enabled, an attacker can broadcast old but still valid IPNS records via pubsub

Actually, another solution is to wrap records in a second (signed) pubsub specific record. That avoids breaking compatibility but is a bit wasteful.

I think we need to have signed messages in pubsub to properly solve this; it covers the poisoning attack and all future avenues for similar attacks without requiring backwards incompatible changes.

vyzo commented 6 years ago

So, how about introducing a PubsubValueStore? Then, instead of passing the DHT to NewRoutingResolver, we'd pass a MultiValueStore that queries multiple ValueStores (the DHT and the pubsub based one) in parallel.

We were discussing unifying the caches with @whyrusleeping and having both resolvers share a cache in the persistent datastore; we decided to leave it for a follow up PR.

vyzo commented 6 years ago

Router versus ValueStore ... Thoughts?

@Stebalien I agree that we need more unification between the two resolvers, with a minimum of shared cache and validation code. However, I think it's perhaps better to do it in a follow-up PR -- I didn't want to expand the scope of this PR too much so as to touch the routing resolver.

Stebalien commented 6 years ago

I think we need to have signed messages in pubsub to properly solve this; it covers the poisoning attack and all future avenues for similar attacks without requiring backwards incompatible changes.

Yeah, that's effectively the wrapper solution but pushed down a level. I still have some reservations about this method but I'll have to think on them a bit.

We were discussing unifying the caches

Sharing the cache is really a secondary goal for me; my primary concern is reducing code duplication (especially reducing duplication of security related code). I think implementing a ValueStore instead of a publisher/resolver would actually be less code.

I didn't want to expand the scope of this PR too much so as to touch the routing resolver.

You wouldn't actually be changing the routing resolver, just passing it a different ValueStore (one that dispatches to the DHT and optionally pubsub).

vyzo commented 6 years ago

You wouldn't actually be changing the routing resolver, just passing it a different ValueStore (one that dispatches to the DHT and optionally pubsub).

There is a complication in terms of usage. The pubsub namesys is only added when the pubsub experiment is enabled, which means that we cannot fully construct this value store at resolver initialization -- we would have to track it somewhere and add pubsub to it post-construction. Nothing insurmountable though, just ugliness :)

Stebalien commented 6 years ago

Can't we just construct pubsub earlier?

On July 15, 2017 1:18:54 AM PDT, vyzo notifications@github.com wrote:

You wouldn't actually be changing the routing resolver, just passing it a different ValueStore (one that dispatches to the DHT and optionally pubsub).

There is a complication in terms of usage. The pubsub namesys is only added when the pubsub experiment is enabled, which means that we cannot fully construct this value store at resolver initialization -- we would have to track it somewhere and add pubsub to it post-construction. Nothing insurmountable though, just ugliness :)

vyzo commented 6 years ago

Not sure; it currently only needs a host, but this may change in the future -- eg if we move the bootstrap inside the pubsub interface as we've been discussing with why.

vyzo commented 6 years ago

@Stebalien So I am not sure at all that hiding the pubsub namesys inside a ValueStore is the correct approach:

@whyrusleeping thoughts?

vyzo commented 6 years ago

I think we need to have signed messages in pubsub to properly solve this; it covers the poisoning attack and all future avenues for similar attacks without requiring backwards incompatible changes.

Yeah, that's effectively the wrapper solution but pushed down a level. I still have some reservations about this method but I'll have to think on them a bit.

We can do this in two ways: either we blanket sign all messages in pubsub, or we export a new method PublishSigned() and we use that one for ipns records and expect all messages to be signed in the resolver.

Stebalien commented 6 years ago

So I am not sure at all that hiding the pubsub namesys inside a ValueStore is the correct approach

I disagree but, after looking into it, I agree that doing this properly may take a while so we should probably fix it later.

it hides pubsub behind an opaque interface, which makes it harder to see it's in use at all. It also drastically changes the design of the code and mixes responsibility for the value store for something that should be a separate component imo.

IMO, pubsub is, logically, a valuestore (or a multi-valuestore). That is, you associate (publish) values (messages) with keys (topics).

It doesn't play nice with the optional nature of pubsub and the delayed initialization, and I don't quite want to move the pubsub initialization around (at least not just yet)

I agree. My plan is to replace IpfsNode.Routing with a pluggable parallel multi-router that can query multiple sub-routers in parallel. To initialize pubsub routing, you'd call node.Routing.Register(myPubsubRouter).

I thought this would be simple. However, doing this properly requires a few changes to the IpfsRouting interface.

I don't think it's going to be less code, as we would have to implement a new interface and duplicate code from the dht

It shouldn't have to duplicate much code from the DHT. Furthermore, it would reuse the existing record validation code from the DHT so you'd be able to add support for new record types to both the DHT and pubsub for for the price of one.

Also, any code needed by both the DHT and pubsub should be extracted and used by both, not duplicated (when possible given go's lack of a type system).

I don't think there is that much code duplicaiton anyway, not for what is usual in Go at least. The sensitive parts reuse existing code already and the rest is simple logic.

All of the validation logic would be deduplicated (assuming we extract the validators).


However, we probably have two very different visions of how this will play out and the only way to settle this would be to actually write the code. Therefore, I agree that we should just go forward with this as-is and then refactor later (once I push some changes to the routing interfaces through).


We can do this in two ways: either we blanket sign all messages in pubsub, or we export a new method PublishSigned() and we use that one for ipns records and expect all messages to be signed in the resolver.

For this to actually work, we'd have to check the signatures in pubsub. That means we'd have to specify a set of valid publishers for every topic up-front and I'm not sure we can reasonably do that for all pubsub usecases. I'm going to discuss this with the orbit people a bit.

Stebalien commented 6 years ago

So, assuming we're going forward with this as-is, it would be nice to parallelize search over the DHT and pubsub (race them). That is, instead of checking pubsub then the DHT, fire off blocking pubsub search in parallel with a blocking DHT search and then cancel the context when one of them returns a value.

I'm planning on adding some way to get the last message broadcast on a topic from peers on subscribe so blocking on pubsub will likely allow us to learn about the IPNS record faster than querying the DHT (assuming we already know of some peers subscribed to the IPNS record in question).

vyzo commented 6 years ago

So I am not sure at all that hiding the pubsub namesys inside a ValueStore is the correct approach

I disagree but, after looking into it, I agree that doing this properly may take a while so we should probably fix it later.

That was mostly my disagreement too, that doing something incisive would be a bigger effort and out of scope for the PR.

vyzo commented 6 years ago

So, assuming we're going forward with this as-is, it would be nice to parallelize search over the DHT and pubsub (race them). That is, instead of checking pubsub then the DHT, fire off blocking pubsub search in parallel with a blocking DHT search and then cancel the context when one of them returns a value.

The pubsub resolver should return immediately, as it retrieves values from an in-memory datastore. There is no blocking for the request, modulo subscribing to the topic which is a non-blocking operation (the bootstrap happens in a goroutine).

vyzo commented 6 years ago

I'm planning on adding some way to get the last message broadcast on a topic from peers on subscribe so blocking on pubsub will likely allow us to learn about the IPNS record faster than querying the DHT (assuming we already know of some peers subscribed to the IPNS record in question).

Sure, and when we have the blocking pubsub that retrieves last message, we can revisit the parallelization in the code.

vyzo commented 6 years ago

However, we probably have two very different visions of how this will play out and the only way to settle this would be to actually write the code. Therefore, I agree that we should just go forward with this as-is and then refactor later (once I push some changes to the routing interfaces through).

Our vision for the end goal is probably not so far away! I just tend to prefer implement and iterate, where we start a workable solution and minimally invasive changes and improve it over time.

vyzo commented 6 years ago

For this to actually work, we'd have to check the signatures in pubsub. That means we'd have to specify a set of valid publishers for every topic up-front and I'm not sure we can reasonably do that for all pubsub usecases.

We don't have to do it for all topics -- we already have the machinery in the form of topic descriptors. For the case of ipns pubsub we could specify the publishing peer as the only source, and we can sign/check messages only for topics with specific source.

vyzo commented 6 years ago

Another possible issue, regarding pubusub bootstrap: memory of the DHT from Provide. What happens if the provider records expire some 24hrs after publication? I think we need to periodically re-provide the bootstrap rendezvous.

Stebalien commented 6 years ago

Sorry, I thought I had replied to this a while ago...

What happens if the provider records expire some 24hrs after publication?

Our current system for advertising interest in a topic in the DHT is broken anyways. The provides mechanism is supposed to be used to indicate that one has some block, not that one is interested in a pubsub channel.

We don't have to do it for all topics

We will eventually need to prevent those DoS vectors and I'd like to avoid having an ad-hoc verification scheme split up across multiple levels.

For now, it may be best to merge this and fix the DoS issues later (just be sure to fix them before we make this non-experimental).

vyzo commented 6 years ago

What happens if the provider records expire some 24hrs after publication?

Our current system for advertising interest in a topic in the DHT is broken anyways. The provides mechanism is supposed to be used to indicate that one has some block, not that one is interested in a pubsub channel.

I think long term we need to move the bootstrap mechanism inside the pubsub interface (when it's generalized from floodsub). For now, I think I will add a periodic re-provide of the rendezvous record, otherwise the pubsub rendezvous will not be visible for for nodes that try to join more than 24hrs after bootstrap.

vyzo commented 6 years ago

I will be pushing updates before a final rebase, so that we can have a coherent review conversation.

vyzo commented 6 years ago

@whyrusleeping ping -- thoughts on this?

whyrusleeping commented 6 years ago

@vyzo getting back to this now :)

Side note: @MichaelMure this PR should make ipns resolves much faster.

whyrusleeping commented 6 years ago

@vyzo lets add a way to enable or disable this independently of having the pubsub experiment enabled. Perhaps a flag or some subcommand on ipfs name? It would also be good to have a way to see the current state of the pubsub name system, names currently subscribed to, current values, time of last update, etc. And a way to unsubscribe from name updates would be good to have too. This should probably go in an ipfs name pubsub subcommand or something.

cc @diasdavid @Stebalien @Kubuxu for input here.

The code itself looks pretty good to me, just need some management tools now to move forward.

vyzo commented 6 years ago

@whyrusleeping ok, will update once I get back from ICFP.

vyzo commented 6 years ago

Let's call the flag --enable-namesys-pubsub, which will by necessity enable pubsub.

whyrusleeping commented 6 years ago

Let's call the flag --enable-namesys-pubsub, which will by necessity enable pubsub.

SGTM

vyzo commented 6 years ago

Rebased on master.

vyzo commented 6 years ago

Reduced bootstrap provide period to 8h and added the --enable-namesys-pubsub option to enable pubsub for IPNS as discussed.

vyzo commented 6 years ago

Extracted the unrelated testutil changes to #4281; will rebase once it's merged.

vyzo commented 6 years ago

rebased on master.

whyrusleeping commented 6 years ago

@vyzo could you rebase this? and maybe squash it down to a few commits?

vyzo commented 6 years ago

sure, i can rebase! we can squash down at merge time, i don't mind it being a single commit.

whyrusleeping commented 6 years ago

@vyzo a few questions:

  1. This doesnt speed up the first resolution, right? If i join the network and do a resolve, i still have to use the dht, unless the publisher publishes again.
  2. If I bring my node online, i get a few values resolved through pubsub, then i go offline, and the node publishes an update while i'm offline, then i come back online and resolve, what happens?
  3. Is there a discussion anywhere on asking nodes for the latest value? (cc @Stebalien)
  4. Any thought towards having an option for nodes to periodically rebroadcast their records? unclear if this would help or not.
vyzo commented 6 years ago

rebased, and I decided to squash as well so that we have a cleaner multi-commit history.

vyzo commented 6 years ago

@whyrusleeping:

This doesnt speed up the first resolution, right? If i join the network and do a resolve, i still have to use the dht, unless the publisher publishes again.

correct; the first resolution will be through the DHT.

If I bring my node online, i get a few values resolved through pubsub, then i go offline, and the node publishes an update while i'm offline, then i come back online and resolve, what happens?

You will miss that update. Also, if your node actually shuts down, when you come back online your pubsub state is clean; you are not resubscribed automatically, as the subscriptions are not persisted to disk (perhaps they should though).

Is there a discussion anywhere on asking nodes for the latest value? (cc @Stebalien)

No; we don't have this kind of functionality in place anywhere afaict.

Any thought towards having an option for nodes to periodically rebroadcast their records? unclear if this would help or not.

It could help in case of lost messages, but not clear it adds much otherwise.

vyzo commented 6 years ago

rebased on master

Stebalien commented 6 years ago

Is there a discussion anywhere on asking nodes for the latest value? (cc @Stebalien)

@keks is planning on looking into this as soon as I take a look at his issue on go-floodsub (https://github.com/libp2p/go-floodsub/issues/42).

I thought we discussed this somewhere but it must not have been in this issue (IRC? maybe not?). IIRC, I came to the conclusion that this was enough code for a first pass and that we can add features like that later (keep the PR small and manageable). The correct way to fix that is in floodsub itself (@keks' proposal) so it wouldn't have much bearing here.

whyrusleeping commented 6 years ago

Ah, i remember the last thing here. The go-testutil package being imported is a little bit borked. Its imports arent quite right, so we end up with some duplicates. Could someone handle that?

vyzo commented 6 years ago

is there anything needed in this PR?

vyzo commented 6 years ago

~Forced pushed the last commit to restart jenkins -- errored out with some system-specific gremlin.~ Rebased on master.

vyzo commented 6 years ago

Updated for changes in the command interface.

vyzo commented 6 years ago

it's not clear why jenkins failed -- all the not ok's are marked as known breakages. I also can't restart the build through the interface for some reason.

keks commented 6 years ago

@vyzo I restarted both the broken travis build and the jenkins build. Both fail, but at different places.

Travis: in test/sharness/t0183-namesys-pubsub.sh

expecting success: 
    echo -n "" > expected &&
    ipfsi 1 name pubsub subs > subs1 &&
    ipfsi 2 name pubsub subs > subs2 &&
    test_cmp expected subs1 &&
    test_cmp expected subs2

> diff -u expected subs1
--- expected    2017-11-19 13:20:38.000000000 +0000
+++ subs1   2017-11-19 13:20:38.000000000 +0000
@@ -1 +0,0 @@
--n 

not ok 18 - check subscriptions

It looks like echo -n prints -n on the Travis Mac builders.

Jenkins: in test/sharness/t0088-repo-stat-symlink.sh

expecting success: 
  export IPFS_PATH="sym_link_target" &&
  reposize_direct=$(ipfs repo stat | grep RepoSize | awk '{ print $2 }') &&
  export IPFS_PATH=".ipfs" &&
  reposize_symlink=$(ipfs repo stat | grep RepoSize | awk '{ print $2 }') &&
  test $reposize_symlink -ge $reposize_direct

not ok 4 - 'ipfs repo stat' RepoSize is correct with sym link
vyzo commented 6 years ago

It looks like echo -n prints -n on the Travis Mac builders.

wow, that's very unfortunate. I wonder if there is a version of echo that works correctly, otherwise i'll have to use touch with a unique filename.

in test/sharness/t0088-repo-stat-symlink.sh

that's completely unrelated to this PR.

vyzo commented 6 years ago

Changed the sharness test to use rm && touch instead of echo -n.