ipfs-shipyard / integration-mini-projects

Ideas and tracking for small one week "mini projects" that integrate across IPFS
MIT License
14 stars 2 forks source link

IPNS over PubSub Pinning #4

Open aschmahmann opened 5 years ago

aschmahmann commented 5 years ago

There is on-going work to make IPNS over PubSub have a faster initial resolution time by adding persistence to PubSub (e.g. libp2p/go-libp2p-pubsub#171). This scheme works in p2p scenarios where various users are each subscribed to the PubSub channel named after the IPNS Key. However, another use case we'd like to support involves having dedicated pinning nodes that can each provide many IPNS Keys.

Audience: IPNS users that want to plan to provide many IPNS keys from a single machine. This may also include IPFS pinning services interested in providing IPNS support.

Impact: Will enable interactions such as users publishing content to IPNS and paying someone else make sure it's available, all without giving away their private IPNS signing keys.

Stakeholders: go-ipfs team (and infra if we'd like to test deploying a pinner internally)

jimpick commented 5 years ago

I definitely want to see an IPNS pinning service in some form ... it appears there may be multiple ways to implement such a service... eg. republishing existing IPNS records in the DHT (not pubsub).

Various related ideas have been around for a long time!

I recall some discussion (perhaps in the last IPNS Tiger team meeting?) about the meaning of the TTL in the IPNS records, and how that possible could be extended, and how third-party peers might be able to republish IPNS keys to keep them alive in the DHT.

Of course, the DHT is slow, so I'd like to understand better how IPNS interacts with pubsub, and how the proposed persistence layer would work. There seems to be some differing opinions on where the persistence might live on the network and in the code, so we might want to try implementing several different approaches to facilitate discussion and see what works best by doing testing (if we have the luxury of taking some time to get it right).

marcinczenko commented 5 years ago

It is very difficult currently to find out what works and what does not. Pinning ipns names still seems to just hang (all my deamons are started as ipfs daemon --enable-namesys-pubsub --enable-pubsub-experiment and I publish with ipfs pin add /ipns/QmQUcC5iRXee1QCgavsT1oWwGRzbSZPm8PhudWmUGWDmp8). There is an old thread about pinning ipns names (https://github.com/ipfs/go-ipfs/issues/1467) but it is still unclear to me whether it works or not...

aschmahmann commented 5 years ago

The definition of pinning that I am using has two components:

@marcinczenko if I understand correctly ipfs pin add pins the CID (data) at the end of that path and not the pathing itself. More concretely, if QmKey is the IPNS key that points to QmData then ipfs pin add /ipns/QmKey results in QmData being pinned (i.e. it won't be garbage collected).

I'm not sure if you were running into issues with IPNS publishing over PubSub (e.g. ipns name publish QmData), but there is an issue being tracked at libp2p/go-libp2p-pubsub-router#28. Fortunately, libp2p/go-libp2p-pubsub-router#29 should make IPNS publishing much faster while we work on the other parts of the above issue.

@jimpick An overview of IPNS over PubSub now and with some of the proposed changes are below

Currently

Most of the go code for this is available at https://github.com/libp2p/go-libp2p-pubsub-router

First time publishing (per boot of the node)

When doing an IPNS publish the publishing node both puts the latest IPNS record in the DHT and bootstraps the PubSub network for the particular IPNS key. The publisher then sends to the IPNS record over the PubSub network.

PubSub Bootstrapping

Bootstrapping has two components: The node advertises that it supports PubSub for a particular topic/IPNS Key, and the node discovers other nodes that support PubSub for the given topic/IPNS Key. The advertising/discovery mechanism currently used for PubSub is to put/retrieve DHT provider records (i.e. the publisher's peerID and multiaddresses, like we do for IPFS) for the data pubsub:IPNS Key.

Subsequent publishing

Publish the IPNS record to the DHT and over PubSub

First time resolving (per boot of the node)

Do an IPNS resolve over the DHT and bootstrap PubSub to the particular IPNS key. The resolution occurs over the DHT since unless you are lucky enough that a PubSub node publishes an update while you're in the middle of resolving PubSub will have no records available.

Subsequent resolving

If messages came in over PubSub since the first resolution then the latest message will be cached and waiting to be returned during a subsequent resolution.

Proposed IPNS over PubSub Improvements

PubSub Bootstrapping

Make PubSub's bootstrapping mechanism compliant with anything following the discovery interface. See libp2p/go-libp2p-pubsub-router#28 for more details.

First time resolving (per boot of the node)

Most of this work is on-going at libp2p/go-libp2p-pubsub#171

Race the DHT against PubSub for resolving the data. However, this time it will be possible for us to receive data from PubSub because as soon as we connect to a PubSub node subscribed to our topic/IPNS Key they will send us the latest version automatically. This seemingly small change both makes IPNS initial resolution much much faster and gives us republishing basically for free.

Where IPNS over PubSub Pinning comes in

Given that we have IPNS over PubSub improvements coming down the pipeline that will finally enable us to actually do pinning/non-author republishing I'd like us to start working on how this pinning will work. There are some UX suggestions at ipfs/go-ipfs#4435, but there's space to explore here both in the UX and connection/resource management.

marcinczenko commented 5 years ago

Thanks @aschmahmann for you explanation.

Yes, this is what I expect as well.

  1. I first added a content to one node. You can check it by doing:

    $ ipfs dag get /ipfs/zdpuB3Pn7miXhJ7pGM38EcSJ7mugkAMX3kHAPyVggXHJrRERo | jq
  2. I publish the resulting CID to IPNS => I get IPNS name.

    $ ipfs name publish /ipfs/zdpuB3Pn7miXhJ7pGM38EcSJ7mugkAMX3kHAPyVggXHJrRERo
    Published to QmQUcC5iRXee1QCgavsT1oWwGRzbSZPm8PhudWmUGWDmp8: /ipfs/zdpuB3Pn7miXhJ7pGM38EcSJ7mugkAMX3kHAPyVggXHJrRERo
  3. I am going to another node that I own.

  4. I am checking if the name published on the previous node resolves - it takes time, but yes it resolves correctly:

    $ ipfs name resolve /ipns/QmQUcC5iRXee1QCgavsT1oWwGRzbSZPm8PhudWmUGWDmp8
    /ipfs/zdpuB3Pn7miXhJ7pGM38EcSJ7mugkAMX3kHAPyVggXHJrRERo
  5. Then I do

    $ ipfs pin add /ipns/QmQUcC5iRXee1QCgavsT1oWwGRzbSZPm8PhudWmUGWDmp8
    ^C
    Error: Post http://127.0.0.1:5001/api/v0/pin/add?arg=%2Fipns%2FQmQUcC5iRXee1QCgavsT1oWwGRzbSZPm8PhudWmUGWDmp8&encoding=json&recursive=true&stream-channels=true: context canceled

And the last one does not seem to return (and so ^C so that you can also see the output when I kill it). BTW, it does not matter if I try to pin with /ipns prefix or not. The result is the same - it hangs.

What I expect is that zdpuB3Pn7miXhJ7pGM38EcSJ7mugkAMX3kHAPyVggXHJrRERo is pinned on my second node (the one from which I issue ipfs pin add /ipns/...). So when I do ipfs pin ls on that node I should see it on the list. Right?

IPNS is crucial for our project. We are building a self-sovereign identity system based on IPID DID method. We will most probably use DNSLink for awhile, but IPNS is the target. I hope I will have time to dive into this...

marcinczenko commented 5 years ago

Earlier I found this thread https://www.reddit.com/r/ipfs/comments/9szwhl/is_there_an_api_for_uploading_files_to_a_remote/ and I wanted to observe if the following comment on that thread is true:

Depending on what you want, you may not even need a library: Put IPFS on your desktop. Put IPFS on your server. Have your server pin the IPNS name for your desktop. Boom done. Now any time you pin something on your desktop, your server will mirror it. Even if you are offline, it will start mirroring when you connect.

marcinczenko commented 5 years ago

I just realised that port 4001 was not open for my first node (ip changed). This is may the reason why it was hanging... I am checking it now again. I see much more peers now...

marcinczenko commented 5 years ago

Ok, so indeed I had a bit of mess with my port configuration. Running two nodes on the same local network and expect port forwarding to work without changing the ports is not wise. Stupid me!

So in the end I managed to get peers connected via ipfs swarm connect. Then I republished the name, which then resolves immediately. Content also resolves immediately. And then pinning the IPNS name also does what it should - it gets the referred content pinned.

The above mentioned (comment link) magical mirroring unfortunately does not work but more importantly, resolving names published to other keys, seems to be unreliable. I am getting errors like Error: context deadline exceeded or Error: could not resolve name and from time to time it does actually resolves. It always resolves on the node where it was published - the problem I describe is only occurring on the other node...

aschmahmann commented 5 years ago

@marcinczenko If you could give me a log of your commands and outputs it might be easier to diagnose. However, my instinct is that either your second node is not properly connected to the network or that in the DHT walking required to currently resolve the IPNS address (in the current implementation as opposed to the aforementioned PR IPNS over PubSub still gets data from the DHT initially) your nodes is having trouble locating the information.

If you're still not finding the data even after an ipfs swarm connect + manually republishing from the authoring node while the searching node is doing its resolve then there are almost certainly configuration problems.

marcinczenko commented 5 years ago

@aschmahmann I am trying to observe what's happening. When I have a more structured observation (I will be testing more and over next weeks) I will share.

I think it is still a bit too early for me to make conclusions, but from what I see, ipfs name resolve have tendency to return Error: context deadline exceeded, but then after trying it a couple of times it seems to finally pick it up. What you say about "manually publishing from the authoring node when the other node is doing its resolve" indeed seems to have immediate impact: the resolving node is getting it almost instantly after the other node is publishing (I feel it resolve even before the other one finished republishing. I think that my configuration is ok, things seem to do what they are expected to do - the glitch is that I currently have a feeling it is hard to depend on it as sometimes I need to try more than once or maybe even restart the daemon (but as I said I still have too little data to make conclusions). Doing ipfs dht findpeer seem to work most of time (although occasionally I get error saying that the route cannot be found - sorry I did not capture the actual response for this one). And then ipfs swarm connect also works most of the time, although recently it feels to me that I had to restart the daemon on the other node to accept connection. When it fails, it fails this way:

» ipfs swarm connect /ip4/85.144.224.194/tcp/4001/ipfs/QmevpVT42bVGWoz76u1naEfjPVdwptqQVugevNmXUJPCbi
Error: connect QmevpVT42bVGWoz76u1naEfjPVdwptqQVugevNmXUJPCbi failure: context deadline exceeded

And by the way, the efficiency of ipfs name resolve is correlated with with the output of ipfs name pubsub sub. When the name to be resolved is included in the output, then ipfs name resolve resolves quite fast (even when the nodes are not directly connected in a swarm):

» ipfs name pubsub subs
/ipns/QmQUcC5iRXee1QCgavsT1oWwGRzbSZPm8PhudWmUGWDmp8
/ipns/QmXKJcdEmXoGaAKoyKBR8SSCmdeS6i8kbsicY146zUuj7P
/ipns/QmNqXtJy3x2EXgABGD2BRWoJ5x4eY9pddn4Hr5sy55ZzJt

Which to my current understanding, seem to make lots of sense.

So, I hope when I do more testing I will have more structured feedback. I need to make tests where the publishing node republishes regularly as now I am not too disciplined in doing that. Soon we will have more node at different physical locations (running on RaspberryPI) and also some hosted on AWS to compare.

I will also try to (re)educate myself a bit and hopefully soon I will start digging more into implementation details. Till then, thanks in advance for your all your answers.

marcinczenko commented 5 years ago

An update on my testing. I am testing IPNS pubsub on two stable nodes in different geographical locations. The nodes are stable, always on, and having a fixed external IPs.

I will call one node a publisher (node id: QmPTKK2mrmkBgBrYni6cd8cDBtxhtfj5ADsZk8RBnYM3WJ) and the other one a subscriber (node id: QmevpVT42bVGWoz76u1naEfjPVdwptqQVugevNmXUJPCbi). Initially the nodes are not directly connected:

# issued at subscriber
$ ipfs swarm peers | grep QmPTKK2mrmkBgBrYni6cd8cDBtxhtfj5ADsZk8RBnYM3WJ
// EMPTY

and I am using pubsub in the gossipsub mode:

$ ipfs config Pubsub.Router gossipsub

On a publisher I do the following:

  1. I add some content:

    $ echo "{\"text\": \"Demo Content\"}" | ipfs dag put --format=dag-cbor --hash=sha2-256
    zdpuAzbUtM4bXo79KpFeCXeBUeUuAnbypPapoTPukJTMQsULN
  2. I pin it:

    $ ipfs pin /ipfs/zdpuAzbUtM4bXo79KpFeCXeBUeUuAnbypPapoTPukJTMQsULN
    pinned zdpuAzbUtM4bXo79KpFeCXeBUeUuAnbypPapoTPukJTMQsULN recursively
  3. I am creating a new key:

    $ ipfs key gen --type=rsa --size=2048 test-key
    QmZsR3ojf6WiM5ZKDyXQSmwh33zHK248473Mbf1B8GjAnG
  4. I am publishing under this name:

    $ ipfs name publish --key=test-key /ipfs/zdpuAzbUtM4bXo79KpFeCXeBUeUuAnbypPapoTPukJTMQsULN
    Published to QmZsR3ojf6WiM5ZKDyXQSmwh33zHK248473Mbf1B8GjAnG: /ipfs/zdpuAzbUtM4bXo79KpFeCXeBUeUuAnbypPapoTPukJTMQsULN

On the subscriber:

I am trying to resolve the published name (I wait till the publish command returns):

  $ ipfs name resolve /ipns/QmZsR3ojf6WiM5ZKDyXQSmwh33zHK248473Mbf1B8GjAnG
  Error: could not resolve name

I repeat it a couple of time, every time getting the same error message or Error: context deadline exceeded.

If I now check the list of subscriptions I get an empty list:

$ ipfs name pubsub subs

So this is the first problem: The name does not resolve at all.

Now let's get back to the publisher and let's take a look at the published topics:

$ ipfs name pubsub subs
/ipns/QmZsR3ojf6WiM5ZKDyXQSmwh33zHK248473Mbf1B8GjAnG

This looks correct: a topic for the published name has been created. I can also resolve the name locally (I am still on the publisher node):

$ ipfs name resolve /ipns/QmZsR3ojf6WiM5ZKDyXQSmwh33zHK248473Mbf1B8GjAnG
/ipfs/zdpuAzbUtM4bXo79KpFeCXeBUeUuAnbypPapoTPukJTMQsULN

It resolves immediately, which to my best knowledge is the intended behaviour. Looking at the documentation of ipfs pubsub (not ipfs name pubsub) one can find out that for pubsub mechanism to work there must exist a chain of connected nodes subscribed to the given topic for the information to propagate. So, now I am going back to subscriber and I connect:

$ ipfs swarm connect /ip4/13.53.36.107/tcp/4001/ipfs/QmPTKK2mrmkBgBrYni6cd8cDBtxhtfj5ADsZk8RBnYM3WJ
connect QmPTKK2mrmkBgBrYni6cd8cDBtxhtfj5ADsZk8RBnYM3WJ success
$ ipfs swarm peers | grep QmPTKK2mrmkBgBrYni6cd8cDBtxhtfj5ADsZk8RBnYM3WJ
/ip4/13.53.36.107/tcp/4001/ipfs/QmPTKK2mrmkBgBrYni6cd8cDBtxhtfj5ADsZk8RBnYM3WJ

The nodes are connected. We still cannot resolve:

$ ipfs name resolve /ipns/QmZsR3ojf6WiM5ZKDyXQSmwh33zHK248473Mbf1B8GjAnG
Error: context deadline exceeded

The fact that publisher and subscribed are connected does not make subscriber to receive anything as the connection has been established after publisher published. If we want the pubsub mechanism to have any impact, we have to go to the publisher and publish while the nodes are connected. This is also what the documentation of ipfs pubsub is saying but is not mentioned in the documentation of IPNS pubsub. So, we go to the publisher and we publish again:

$ ipfs name publish --key=test-key /ipfs/zdpuAzbUtM4bXo79KpFeCXeBUeUuAnbypPapoTPukJTMQsULN

Unfortunately, this will still not work if at the moment of publishing we do not listen on the subscriber node. For pubsub to kick in, we need to first initiate resolve on the subscriber, and the publish on the publisher - all making sure that the two nodes are connected.

This is not error free process. Sometimes I need to repeat the sequence a couple of times before the subscriber correctly subscribes to the topic. In the end though, it always works.

Once the subscriber resolves the name, we can also find the corresponding subscription on the list of subscriptions:

$ ipfs name pubsub subs
/ipns/QmZsR3ojf6WiM5ZKDyXQSmwh33zHK248473Mbf1B8GjAnG

And now resolving works instantly. As long as the two nodes are connected, republishing name to a different content on the publisher will be reflected in the resolved name on the subscriber. The problem is that one cannot force the node to stay connected to another node and once the connection is lost, the connect/resolve/publish sequence has to be repeated in order to resolve the republished name. This problem makes the IPNS pubsub not usable at all as for the node that wants to resolve the content the node needs to follow the following sequence:

  1. If nodes are connected: the subscription will be updated while the publisher republishes the name. This is good.
  2. If the nodes are happen not to be connected anymore, then the resolving node would have to: a. connect to the publisher (this is already a problem - one need to know the address of the publisher ) b. notify the publisher to republish (possible, but unrealistic in general) c. wait for publisher to finish publishing and then resolve.

So currently, from my limited perspective, I see two problems:

  1. The names do not resolve initially when using publish with --key option
  2. There is no way to force the nodes to stay connected.

Even if the (2) above is satisfied in general, once the nodes get disconnected (because one of the nodes lost the Internet connection), we end up in a problematic situation. If publisher published while connection was not active, even when the connection is restored the subscriber will not receive the last update. The subscriber would have to ask the publisher to republish which is not acceptable.

For the (1) I was hoping that maybe gossipsub has some impact, but it does not seem to be the case.

I also checked the whole process for the self key. In this case the problem (1) does not occur. The name always resolves on the both nodes, although it takes more time. What is interesting in this case is that after the name resolves for the first time, subsequent resolves are happening instantly. But after some time (60s - seems to be the result of not using gossipsub) the system falls back to use DHT to resolve the name. But is this the effect of gossipsub, and if so, it means that pubsub is engaged and then it seems to work even when nodes are not connected. I am a bit puzzled... Enabling gossipsub seem to remove this 60s timeout when the node appears to fallback to DHT, which results in the name to be never resolved even after republishing (assuming there is no direct connection between the nodes). So fo the self key, it looks like things are acceptable as long as gossip routing is not enabled. I am still not sure if IPNS pubsub is really active at any time when there is no connection between the nodes.

Regardless of the gossipsub, I do not see the way to make resolving names published with --key working expect when there is active connection between the nodes in which case IPNS pubsub seem to be engaged but requires the above mentioned cumbersome connect/resolve/publish sequence. Shall I disable experimental support to pubsub?

marcinczenko commented 5 years ago

Another update.

Just realised that this thread focuses on pinning in ipns over pubsub. Maybe I should move to a more basic IPNS thread, as the problems I am observing seem to be more fundamental.

I removed the --enable-namesys-pubsub from my daemon, so the start command is now:

ipfs daemon --enable-pubsub-experiment --migrate

And the problem that I mention in my previous comment: the names do not resolve when using publish with --key option seems to be there even without IPNS pubsub support.

On the publisher:

$ ipfs key gen --type=rsa --size=2048 test-key-3
QmZDpUr9JAcFnZGWX3n9iKZRUtJvF7AsGBSQ3VaJ5B5Zig

$ ipfs name publish --key=test-key-3 /ipfs/zdpuAzbUtM4bXo79KpFeCXeBUeUuAnbypPapoTPukJTMQsULN

Then on the other node:

$ ipfs name resolve /ipns/QmZDpUr9JAcFnZGWX3n9iKZRUtJvF7AsGBSQ3VaJ5B5Zig
Error: could not resolve name
$ ipfs name resolve /ipns/QmZDpUr9JAcFnZGWX3n9iKZRUtJvF7AsGBSQ3VaJ5B5Zig
Error: could not resolve name
$ ipfs name resolve /ipns/QmZDpUr9JAcFnZGWX3n9iKZRUtJvF7AsGBSQ3VaJ5B5Zig
Error: could not resolve name
$ ipfs name resolve /ipns/QmZDpUr9JAcFnZGWX3n9iKZRUtJvF7AsGBSQ3VaJ5B5Zig
Error: could not resolve name
$ ipfs name resolve /ipns/QmZDpUr9JAcFnZGWX3n9iKZRUtJvF7AsGBSQ3VaJ5B5Zig
Error: could not resolve name
$ ipfs name resolve /ipns/QmZDpUr9JAcFnZGWX3n9iKZRUtJvF7AsGBSQ3VaJ5B5Zig
Error: could not resolve name
$ ipfs name resolve /ipns/QmZDpUr9JAcFnZGWX3n9iKZRUtJvF7AsGBSQ3VaJ5B5Zig
Error: could not resolve name
$ ipfs name resolve /ipns/QmZDpUr9JAcFnZGWX3n9iKZRUtJvF7AsGBSQ3VaJ5B5Zig
Error: could not resolve name

Only after the peers are in the swarm after doing ipfs swarm connect, the name resolves:

$ ipfs name resolve /ipns/QmZDpUr9JAcFnZGWX3n9iKZRUtJvF7AsGBSQ3VaJ5B5Zig
/ipfs/zdpuAzbUtM4bXo79KpFeCXeBUeUuAnbypPapoTPukJTMQsULN

and even then only occasionally.

On the other hand, resolving the self key seem to be working even when the two nodes are not directly connected.

Then name always resolves correctly when issued from the publishing node (for the names published with --key as well as for those published under self).

Is there something I am doing wrong when publishing content with the --key option?

aschmahmann commented 5 years ago

@marcinczenko if you could open an issue in go-ipfs and tag me (and this issue) to continue this conversation that would be great.

Because the way you search through the DHT is based on your key I'd recommend trying to generate a few keys and see if you get the same results, or if some keys perform better than others. Also, if in the go-ipfs issue you could put the version of your binary that would be great (there have been some fixes since 0.4.20, such as at ipfs/go-ipfs#6291). Thanks for the bug report and see you on the other issue.

marcinczenko commented 5 years ago

I tried a couple of keys - it does not seem to have any impact - they are all pretty consistent.

I will create an issue on go-ipfs.

momack2 commented 5 years ago

Back on the original thread of this suggestion - @aschmahmann - do you have additional constraints and use cases you'd imagine this project solving? fleshing out the UX of working with this command and some of the requirements for how it needs to interface with other parts of the system would add clarity to this proposal

aschmahmann commented 5 years ago

@momack2 @jimpick with IPNS over PubSub memory-based pinning is now implicitly done any time a node calls ipfs name resolve $IPNS_KEY. However, I'd like a persistent form of pinning to also be available.

A useful UX could be just copying the IPFS pin UX and putting behind the name command. ipfs name pin add ipfs name pin ls ipfs name pin rm

The most important thing to implement is persisting IPNS pins to disk and restarting the relevant PubSub + DHT advertisements on reload.

The next most important thing to worry about is scaling. This includes both vertical scaling to minimize resources (like connections) required to be allocated by the pinner as well as horizontal scaling (e.g. orchestrating a multi-node IPNS pinset that might need many connections open and some bandwidth, but very low resource consumption otherwise).

Once the persistence + CLI UX is available we can talk more in depth about which scaling direction to tackle first (my current thinking is horizontal).

daviddias commented 5 years ago

As I read to through this thread, I believe the proposal in question is in fact an upgrade to the IPFS protocol and not a "mini project". This conversation should be moved to ipfs/notes. I can do it if you agree, 👍 this comment to let me know :)

aschmahmann commented 5 years ago

I'm fine with wherever we want this conversation to take place. However, if in the interest of time implementing this is easier as a separate libp2p based ipns-pin binary than as part of IPFS that would certainly satisfy the requirements.