Open ChillFish8 opened 3 years ago
I suggest integrating StatsD for telemetry. StatsD is widely used, allowing extensive tool support and giving LNX users more monitoring options.
Like Gunicorn's approach, StatsD could be enabled only when the host argument is set. Gunicorn Instrumentation docs
I proposed these three metrics based on my current understanding of the project. This setup will also make it easier to add more metrics later if needed.e
We can create a crate to hold all the metrics and StatsD client implementation, and expose APIs to allow lnx-server
and other crates to access StatsD.
In the next couple of days I'll research how to create a StatsD client in Rust, and try to come up with a generic structure to have an extensible crate for the project. If its agreed to move forward with StatsD I can start work on a PR.
I think in terms of telemetry it is more likely we go with OTLP (OpenTelemetry) since it is rapidly becoming the defacto standard, and most if not all monitoring systems now support it and its integrations. Also it is much easier to add via tracing
Understood, using OpenTelemetry sounds good, didn't know there was direct support through tracing lib. Does everything else makes sense, or should we talk more about it before we start working on it?
It sounds fine, I would probably ignore the Active Requests Count
metric for now, as it isn't really the most useful thing in the world compared to the others and requires a bit of custom IO handlers which is a pain.
I think in terms of implementing, most of this can be done by using the OTLP exporter with tracing and then just ensuring we're logging the correct values.
Agreed. I need to look into OpenTelemetry, not very familiar with it yet, will report back with ideas for the structure.
Although im personally not a fan of this it's certainly needed to be able to get a good idea of the areas to focus on, I think. The data collected only really needs to be the average length of queries, type of query and amount of docs (plus index runtime settings) but thats about it, users should be able to opt out just by passing a flag e.g.
--no-telemetry