service discovery on ipfs

The question of service discovery on ipfs has been asked a few times, so I figured I would write out my thoughts on the matter.

Ipfs has a built in mechanism for nodes to advertise that they have a certain chunk of data, this mechanism is called "Providers". It is as simple as telling the routing system "I Provide block X". And requests may be made of the routing system for providers of block X. We can slightly abuse this to have a way for peers running certain services to find eachother easily. By crafting a block unique to the service (I'll refer to this as the "Identification block"), and having all peers running the service "Provide" the block, finding peers running a given service is as easy as requesting providers for that services identification block.

To translate this into something a bit more concrete, you can do this today by doing:

Pick a nonce, something to be prepend to your service names ; this essentially defines your network. If someone else knows this nonce they'll be able to spoof and/or read the state of your service availability.

Publish service availability by publishing a block consisting of { nonce + service name }. In other words, echo $NONCE$SERVICE_NAME | ipfs block put

Finding a service by doing ipfs dht findprovs $BLOCKNAME. Since it's a raw block, $BLOCKNAME can be computed from $NONCE$SERVICE_NAME using a multihash implementation.

to create a 'raw block' with the data you want, you need to use echo $NONE$SERVICE_NAME | ipfs block put.

Also, anyone who knows your $BLOCKNAME will be able to get the block and 'spoof' it, so theres really no need for the nonce, its not really doing anything for you.

oh, good point wrt $BLOCKNAME ; I was thinking of them being able to spoof alternative services.

And yup, ipfs block put instead of add - that's what I get for trying to go from memory.

So: Is there a way to do service discovery without being spoofable?

Well, not in a decentralized manner. If you want a single node to control service discovery you could have peers register with the 'master' node and have it build a list for peers to use.

You could avoid spoofing by including a public key with the service name in the block, and then sign all traffic from that service with it.

TL;DR of the issue I linked from: Paid pin services are a worrying sign. IMO we have enough centralization already; if we let corporate entities manage most of IPFS' replication/redudancy for money, then what's the point of even having IPFS? We need to step up and not allow it to turn into a WWW copy.

I feel the mechanism outlined here can be used in the scenario of people/organizations willing to donate bandwidth and storage and hopefully eliminate the need for paid pin services. Plus, I don't think encryption would be needed. We can devise a specifically crafted block which states exactly "I am a traffic and storage donator". Nothing secret or needing spoof protection here I think.

The part I can't figure out yet (being a total novice in IPFS) is how to also state "I can offer up to 3MB/s upload and 6MB/s download, and I am willing to donate 150GB of space, of which 107GB are free right now". Is there a way to provide such an additional information to somebody after they discover me via the above technique?

Is there any discussion of after you find such a service how you might do port discovery? is there something like ipfs id where you can place some simple linked data, etc, about the services running at the node (versions, ports, etc)?

Services are identified by protocol name, not port. You'd connect to the peer and then use our multistream protocol to connect to the service in question.

that would be something tied to a multistream protocol via a go/js program that listens and proxies requests to that running server?

Yes. We actually have a command that does this. Take a look at the ipfs p2p command. Note that it is currently being reworked a bit due to issues like https://github.com/ipfs/go-ipfs/issues/5032. In the near future, we'd actually like to make a separate libp2p daemon and applications (including go-ipfs) use that.

I'm finding it hard to understand the documentation for multistream...

In multistream, protocols are just strings. In the examples, we use IPFS paths. However, in practice, we currently use paths like /ipfs/kad/1.0 (IPFS kademlia 1.0).

would a multistream solution to finding service+port be to have "/ipfs//my-related-object-service/1.0" respond by proxying to that service.

Yes.

(perhaps producing json or w/e)? (after the w/e appropriate multistream handshake/lookup/etc)

Not sure what you mean here.

I understand the service discovery part here: echo $SERVICE_NAME | ipfs block put and then to discover use: ipfs dht findprovs $SERVICE_NAME

But how can you find additional information about the peers here? For instance, how could you link from this block to another block served by that peer which contained service info?

It would be great if you could go from service discovery to service info discovery. Currently this solution provides no method for these peers to link additional data unless you make another direct link to that peer.

Any thoughts?

Here is my strategy so far. When you discover peers and select them by service (however you do it, as you describe is fine), you then know there is some kind of service operating on that peer. You can proxy to that service and just contact it directly.

You don't know which port the service is on, but IPFS does tunnel at least one known HTTP service as part of its daemon. So, I set up a custom protocol serving JSON over HTTP that can report service information (port, etc) which can be used with other tools. However, if such a service already has a web-based API, you can just bridge it over that IPFS p2p http proxy, which works very well! Thankfully, if you have such a service running already, setting it up with IPFS is fairly straightforward.

These specifically use the "experimental" options that you have to enable at the server-side with "Experimental.Libp2pStreamMounting" and client-side enabling "Experimental.P2pHttpProxy". (Enabled with the ipfs config --json <tag> true command.)

On the server side, with an IPFS daemon running, I do an extra step to have the IPFS p2p proxy to my existing web service, also hosted on that same server:

ipfs p2p listen /x/my-service/http /ip4/127.0.0.1/tcp/9292

Where 9292 is the port of my existing, running service and my-service is the name of the service. The /x/ is required since I want a custom named protocol in my case. And this lets me, on my client, access that web service via:

curl http://localhost:8080/p2p/<peer id>/x/my-service/http/

Where <peer id> is the peer id you discovered in your service search and my-service, again, is the name of the service.

In my specific case, the web service I'm bridging can report information about itself through its own web API. It has a route /system that does this, so I just have the service's client program (on the client system, still) use the above URL as the base URL for the service and it will query http://localhost:8080/p2p/<peer id>/x/my-service/http/system to retrieve it. This information might contain other block hashes or other known peer IDs. Whatever you need.

You can do everything through that proxy, or just use that proxy to the service's web API to retrieve basic information and then switch to accessing the service directly a more traditional way. My service is a federated software archive, and I'm doing distribution of versioned repositories using IPFS as a discovery mechanism. The experimental flags don't bother me since I'm already vendoring the IPFS binary and the system maintains and initializes its own IPFS root. This strategy is working really well, but I'm curious what others think and if there are more "official" ways to do this.

Cheers.

This is really interesting. I think I will stay away from using ipfs for messaging. We need really high throughput of a custom type. However, the idea of using a secondary standard port on each ipfs node is really smart. If I know the IP I can just connect there for additional service information. Thanks for this involved response!

Hm, not sure why this is being reopened. But currently IPFS supports pub-sub and we have used this pretty well to implement service discovery. We implemented a blockchain using IPFS for storage and use then pub-sub to discover all other peers participating in our blockchain and which blocks they mined. See source code and relevant section here.

ipfs / notes

service discovery on ipfs #15