Open generall opened 11 months ago
/bounty $150
/attempt #3322
with your implementation plan/claim #3322
in the PR body to claim the bountyThank you for contributing to qdrant/qdrant!
Add a bounty • Share on socials
Attempt | Started (GMT+0) | Solution |
---|---|---|
🟢 @guhitb | Jan 5, 2024, 1:25:03 AM | WIP |
🔴 @gabrieldemian | Feb 1, 2024, 1:30:46 AM | WIP |
🟢 @Shylock-Hg | Mar 23, 2024, 3:54:53 AM | WIP |
🟢 @xhjkl | Jun 12, 2024, 1:49:11 PM | #4455 |
/attempt #3322
Currently, the /metrics
collection is strongly coupled with the /telemetry
collection. The /metrics
endpoint is just a filtered-out telemetry metrics. (filtering happens in MetricsProvider::add_metrics
trait method).
Does the /telemetry
endpoint need to be updated as well?
I see multiple options to implement this.
GET /telemetry
request.I the original design, the idea was that /metrics is just a different format for /telemetry. So I think extending telemetry is fine. One thing to be careful about - anonymization of the collection names in the telemetry. I think anonymized stats should be aggregated across all collections always
/attempt #3322
What metrics does make sense to add for single collections?
Algora profile | Completed bounties | Tech | Active attempts | Options |
---|---|---|---|---|
@gabrieldemian | 1 Qdrant bounty + 0 bounties from 0 projects |
Rust, TypeScript, JavaScript |
qdrant/qdrant#3331 |
Cancel attempt |
@gabrieldemian I made some progress on my attempt, may be useful to you. Don't think I'll be able to figure the rest out. https://github.com/qdrant/qdrant/compare/master...guhitb:qdrant:master
I was actually looking at this yesterday and wanted to give this a try. I think I can get this working based on @guhitb approach. master...trueleo:qdrant:per-collection
I will not try to attempt this yet because @gabrieldemian is attempting. There are few problems I have noticed with mine and @guhitb approach.
Merging nested map HashMap<String, HashMap<HttpStatusCode, Arc<Mutex<OperationDurationsAggregator>>>>
to HashMap<WebApiTelemetryKey, Arc<Mutex<OperationDurationsAggregator>>>
leads to massive simplification of existing code and makes adding collection as optional value much easier.
pub struct WebApiTelemetryKey {
pub request_key: String,
pub status_code: u16,
pub collection: Option<String>,
}
But the main issue with this is that Serde cannot serialize HashMap<Struct, _>
. This can be workaround using https://crates.io/crates/serde_with.
This will however change the schema of the TelemetryData
JSON entirely. Which may affect telemetry reporter
pub struct TelemetryReporter {
telemetry_url: String,
telemetry: Arc<Mutex<TelemetryCollector>>,
}
impl TelemetryReporter {
fn new(telemetry: Arc<Mutex<TelemetryCollector>>) -> Self {
let telemetry_url = if cfg!(debug_assertions) {
"https://staging-telemetry.qdrant.io".to_string()
} else {
"https://telemetry.qdrant.io".to_string()
};
Self {
telemetry_url,
telemetry,
}
}
If the telemetry service has JSON schema requirements, then probably it is be better to figure out what the response of telemetry data should look like if we want to attach labels ( method, endpoint, status, collection ) to them. Otherwise three layer of HashMaps
to get this implemented will not look good :p
@generall
/attempt
Algora profile | Completed bounties | Tech | Active attempts | Options |
---|---|---|---|---|
@Shylock-Hg | 1 Qdrant bounty + 4 bounties from 2 projects |
C++, C, Shell & more |
Cancel attempt |
@generall Is this issue ready to work on?
/attempt #3322
Is your feature request related to a problem? Please describe.
Currently, all metrics in
/metrics
are global, meaning that it’s impossible to see differences per collection.In addition to that, all our metrics should have per-collection granularity to allow better aggregation in Prometheus, including:
Describe the solution you'd like
Example:
Describe alternatives you've considered
Create dedicated endpoint for each collection
/collections/my-collecton/metrics
but feedback from DevOps on this idea was negative.Additional context
It might be beneficial to allow users to disable per-collection output. It is especially relevant if there are a lot of collections and metric response could become huge. But this is a nice-to-have requirement.
Note for contributors: Please consider this as tracking issue. If you think that it would be beneficial to split the task into multiple smaller PRs, please you are welcome to do so. Bounty will be rewarded for each PR independently