lnx-search / lnx

⚡ Insanely fast, 🌟 Feature-rich searching. lnx is the adaptable, typo tollerant deployment of the tantivy search engine.
https://lnx.rs
MIT License
1.24k stars 46 forks source link

Telemetry data #6

Open ChillFish8 opened 3 years ago

ChillFish8 commented 3 years ago

Although im personally not a fan of this it's certainly needed to be able to get a good idea of the areas to focus on, I think. The data collected only really needs to be the average length of queries, type of query and amount of docs (plus index runtime settings) but thats about it, users should be able to opt out just by passing a flag e.g. --no-telemetry

renanvieira commented 11 months ago

I suggest integrating StatsD for telemetry. StatsD is widely used, allowing extensive tool support and giving LNX users more monitoring options.

Optional Integration

Like Gunicorn's approach, StatsD could be enabled only when the host argument is set. Gunicorn Instrumentation docs

Proposed Metrics

  1. Index Document Changes: Reports the number of documents every create/delete actions is performed.
  2. Query Duration per Index: Measure this duration for each query execution reporting per index.
  3. Active Requests Count: Count requests at route handling, and subtract when completed. I considered tracking at the socket level with hyper-rs but the complexity to implement this in version 0.14.x makes it not viable without significant library modifications.

I proposed these three metrics based on my current understanding of the project. This setup will also make it easier to add more metrics later if needed.e

Telemetry Crate

We can create a crate to hold all the metrics and StatsD client implementation, and expose APIs to allow lnx-server and other crates to access StatsD.

Next Steps

In the next couple of days I'll research how to create a StatsD client in Rust, and try to come up with a generic structure to have an extensible crate for the project. If its agreed to move forward with StatsD I can start work on a PR.

ChillFish8 commented 11 months ago

I think in terms of telemetry it is more likely we go with OTLP (OpenTelemetry) since it is rapidly becoming the defacto standard, and most if not all monitoring systems now support it and its integrations. Also it is much easier to add via tracing

renanvieira commented 11 months ago

Understood, using OpenTelemetry sounds good, didn't know there was direct support through tracing lib. Does everything else makes sense, or should we talk more about it before we start working on it?

ChillFish8 commented 11 months ago

It sounds fine, I would probably ignore the Active Requests Count metric for now, as it isn't really the most useful thing in the world compared to the others and requires a bit of custom IO handlers which is a pain.

I think in terms of implementing, most of this can be done by using the OTLP exporter with tracing and then just ensuring we're logging the correct values.

renanvieira commented 11 months ago

Agreed. I need to look into OpenTelemetry, not very familiar with it yet, will report back with ideas for the structure.