Open raulk opened 5 years ago
I think this issue is too general.
- What addresses of mine are stored in the DHT? Are they as expected?
This can be done with a query to the DHT to find out.
- What DHT records do I currently hold? Who have I served them to?
This can be determined by looking at the appropriate data store.
- When did a record get created? Which peer ID stored the record? When was it last queried?
I'm not sure about this. Why? It's a lot of metadata.
- What provider records do I hold? When do they expire?
Datastore again.
Are the nodes I'm pointing to still alive?
Routing table.
- Dump the routing table.
Routing table
Trace routing table changes.
This is an interesting one, and very useful for debugging. A logger subsystem, or a few callbacks to allow users to interpret it how they wish would achieve this.
Can we process the specifics and generate specific issues from this? We need to keep focused.
@anacrolix Sure, go ahead. If you don't mind, just add backlinks from the children issues into this one, so we can treat it as an epic.
One debug metric I wanted for a long time is a number of items in each of kbuckets being exported as a metric. This would allow to debug/discover some possible implementation errors.
@Kubuxu On that subject, take a look at this: https://github.com/libp2p/go-libp2p-kad-dht/issues/194. I can tell you the answer already: 7 furthest buckets are full, 8th is half full, the remaining 248 logical buckets are empty with an extremely high likelihood.
P.S.: But yeah, that metric makes sense as a digest of the full routing table dump.
Can we close this and create a metrics label? Super issues are too fluffy and conversation will be interleaved across different metrics.
Let’s do both. Keep this one as an epic that serves like a user/passer-by entrypoint for discussion. Also open issues for the specific stuff we’ve decided to implement. I like the label.
All the metrics stuff can be addressed by #252, #300, and #297.
A list of metrics is tracked in #304.
What's the overall state of metrics in libp2p? Right now I'm specially interested in two -- https://discuss.libp2p.io/t/how-to-know-of-peers-dialed-of-dials-failed-per-each-find-peers-find-providers-query/341/4 --
For reference: Here is the url to the docs of the Stats API in js-libp2p that @pgte created long time ago -- https://github.com/libp2p/js-libp2p#switch-stats-api
Can we get the metrics by query exported https://github.com/libp2p/go-libp2p-kad-dht/blob/master/query.go#L106-L110 ? It would help me understand the efficiency of our routing
@daviddias those details would be part of a trace, because they are transactional metrics, i.e. they pertain to a particular transaction in the system. I don't think there's much value in calculating averages, counts and percentile distributions globally (which is what OpenCensus metrics are about -- runtime stats).
@daviddias those details would be part of a trace, because they are transactional metrics, i.e. the pertain to a particular transaction in the system.
That would work for the usecases I can think of 👍
Update: Ah! When I said export, I wasn't thinking in the "Export from the Golang package sense". I was just looking to have access to the information, hence a trace would be perfect!
The DHT is a pretty central element of the libp2p stack. As our adoption grows, users demand better visibility, debuggability and diagnostics. This issue pulls together ideas we've discussed.
Metrics
We need a way to collect and expose metrics on a per-query basis (and return a stats object as a third argument from methods), as well as global moving aggregates/accumulators that can be queried anytime (or dumped periodically through an exporter like Prometheus).
Debuggability/diagnostics
Introspective queries like the following will provide better management and diagnostics of the DHT.
Some of these require additional bookkeeping. Some are too expensive/voluminous to track by default: they should be switched off OOTB, and users should opt-in explicitly knowing the implications.
Queriability
Collecting this wealth of information would be fruitless if we didn't expose it to the user via tooling. Unfortunately libp2p lacks an instrumentation/monitoring/management subsystem (for now) to serve as a sink for all this data. A transitory, simple solution is to expose these metrics via a local gRPC endpoint or similar, and develop a command line tool (similar to
ipfs dht
) that serves as a frontend.