libp2p / go-libp2p

libp2p implementation in Go
MIT License
6.09k stars 1.07k forks source link

remove the database-backed peerstore #2329

Open marten-seemann opened 1 year ago

marten-seemann commented 1 year ago

As far as I can tell, it's only used by drand in PLN: https://github.com/drand/drand/blob/ade43023d83b5b06f15f6563ec0e7b246b0829db/lp2p/ctor.go#L21 And by 0xmesh: https://github.com/0xProject/0x-mesh/blob/a40744bd096d2888fc16941bb940959254ea7c38/p2p/opts.go#L18 (There might be more)

The database-backed peerstore is what required https://github.com/libp2p/go-libp2p/issues/2231 and the mess we got into by adding contexts (see #2327).

It doesn't make a lot of sense to have a database-backed peerstore in the first place. The peerstore uses the disconnect event to garbage collect old entries, so there's little point in persisting this information. You'd just end up with entries that will never be evicted.

Note that this proposal is only to remove our pstoreds implementation of the peerstore interfaces. Users that still believe that there's value in having a database-backed peerstore are free to adopt that code: Applications can provide their own implementation of the peerstore interface in the libp2p constructor.

l0k18 commented 1 year ago

This metadata store, it surprises me a lot that it is not used enough to warrant maintaining it. Do all the consuming applications bootstrap the metadata of peers every time or do they all have their own persistence?

I am using it, but the lack of iterators in the interface is quite a problem, can't count the records using it, would either have to separately and redundantly implement an index table and keep it synced or write a new implementation and extend it with a new interface that has this necessary stuff by embedding the limited base interfaces.

If there is no reason relating to supporting other languages accessing the on disk database, why would there be any sense in replacing it with sqlite, when it is not optimised for a large dataset? Sqlite is an option for small data sets that is easily accessed by other languages. It would be inefficient to have it handle more than a hundred megabytes of data.

marten-seemann commented 1 year ago

You make some excellent points why we should remove it :)

First of all, we need to ask if libp2p itself needs the metadata store? With a super easy refactor, the answer is no. The next question we need to ask is if libp2p should provide this functionality to applications, even though it doesn't need it itself. In my view, libp2p is a p2p networking library. It should provide networking functionality, but not random other functions like a persistent data store. As you describe yourself, the abstractions currently provided are inadequate for that (no contexts, no iterators, no gc, ...), and that's a symptom of libp2p trying to solve more problems than it should try to solve in the first place.

l0k18 commented 1 year ago

Aside from simple peer discovery, essentially advertising keys going along with addresses, most other importers of this library are not running a gossip system to propagate network state information, as their protocol has already covered every part of that. You are correct.

If people need this stuff it will be at our repository.

l0k18 commented 1 year ago

I want to report another thing I discovered related to this. It stores the libp2p.Host private key in the data store, in cleartext.

That also means that all libp2p apps are storing at least one excess copy of this key outside of the conusming app's own key storage. It probably should be fixed regardless of how else the rest of this works, to at least document this.

I will just be aiming to make it use an encrypted Badger instance instead. We already have one in our codebase and deduplicating it was already planned.