jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.53k stars 2.44k forks source link

ClickHouse as a core storage backend #4196

Open yurishkuro opened 1 year ago

yurishkuro commented 1 year ago

Summary

Build first-class support for ClickHouse  as an official Jaeger backend. ClickHouse is an open-source column-oriented database for OLAP use cases. It is highly efficient and performant for high volumes of ingestion and search making it a good database for tracing and logging data specifically. It can also do aggregates very quickly which will come in handy for several features in Jaeger.

Benefits to the users:

Background

This is a continuation of #1438. Currently, there is one (or more) community-supported plugins for ClickHouse as a Jaeger storage backend. This ticket is about making it one of the core backends supported by Jaeger, on the level of Cassandra/Elasticsearch.

Scope

Expected outcomes

Stretch goals

yurishkuro commented 1 year ago

For reference:

abhi1287 commented 1 year ago

Suggestion: It would be nice to somehow retain the type information of custom tags. The otel implementation you linked above simply stores a vector of string key value pairs. While jaeger does the same with elastic search, it is a pain in running aggregate queries over the custom tags in spans.

yurishkuro commented 1 year ago

@abhi1287 can you book a separate issue for this (it sounds like a cross-storage problem) and give examples of custom tags that need numeric aggregations?

siddharthsingh025 commented 1 year ago

Is anyone working on this ??@yurishkuro

yurishkuro commented 1 year ago

@siddharthsingh025 this ticket is in support of GSOC program that starts in the summer.

siddharthsingh025 commented 1 year ago

@yurishkuro , I would like to contribute / participate for gsoc 2023 by working with Jaegar. I researched a lot on this project and I would really like to contribute to this project. I am familiar with Golang and SQL database design. I also have a good knowledge about opentelemetry. Looking forward for your guidance on this project. Thanks!

yurishkuro commented 1 year ago

@siddharthsingh025 please see GSOC's guidelines on applying, and you can specify the Jaeger project as a preference.

octonawish-akcodes commented 1 year ago

I am interested in this project any resources and slack link ?

yanyanran commented 1 year ago

@yurishkuro Hello! I am very interested in this project and would like to contribute to Jaeger, I am familiar with Golang and love to learn and delve into distributed systems, I think it is really interesting. I also learned a lot about the SQL database, and I can devote myself to the project. Can you give me some advice and guidance? Thank u!

Reireirei0 commented 1 year ago

@yurishkuro , I'd like to contribute to Jaeger. It just so happens that I am currently maintaining a clickhouse query service in ByteDance as an intern. This service is called rigel and is also written in golang. With your guidance, I am pretty confident in finishing this feature. Looking forward to your guidance! Thanks!

Nandini99-git commented 1 year ago

Hey @yurishkuro , I am interested in this project for GSOC 2023. I want to work on this project. I am fresher but I am familiar with the tools and technology which is required for this project.

yurishkuro commented 1 year ago

Applications will be open from March-20 to April-4: https://summerofcode.withgoogle.com/

GauriBhandari commented 1 year ago

Hey @yurishkuro I am a student from India and I would like to contribute to this issue. I wanted to know if this is still available? If it is available then I would like to write a proposal for this.

yurishkuro commented 1 year ago

It's available, the applications aren't open till tomorrow. The choice of the proposal will be made according to the GSOC's timeline.

james-ryans commented 1 year ago

Hi, I’m James Ryans from Indonesia. I might be too late to introduce myself but hopefully my appearance will be noticed. I wanted to share the references that I used to write my proposal which might help you to onboard on this project, which consists of:

  1. The idea of Jaeger and its history (https://www.uber.com/blog/distributed-tracing)
  2. The architecture of Jaeger (https://www.jaegertracing.io/docs/1.43/architecture/)
  3. Learn the conversation of Jaeger #1438 issue (https://github.com/jaegertracing/jaeger/issues/1438)
  4. Learn how the community implemented the ClickHouse as written in (https://github.com/jaegertracing/jaeger/issues/4196#issuecomment-1411151681)
  5. Grasp some idea of Cassandra schema (https://github.com/jaegertracing/jaeger/tree/main/plugin/storage/cassandra)
  6. Deep dive into ClickHouse indexing to design the schema (https://clickhouse.com/docs/en/optimize/sparse-primary-indexes)

And hi @yurishkuro, I’ve summited my proposal at GSoC platform. Do you mind if I send you my google docs proposal link to your Slack so that we can discuss there?

yurishkuro commented 1 year ago

@meneketehe sure, send it

GauriBhandari commented 1 year ago

Hey @yurishkuro is this project still available for gsoc? I am unable to see it on gsoc's organization dashboard. Could you please help

ChillOrb commented 1 year ago

Hey @yurishkuro is this project still available for gsoc? I am unable to see it on gsoc's organization dashboard. Could you please help

Hey Gauri , it's under CNCF

  1. Go to CNCF
  2. View Ideas list
  3. Scroll to Jaeger
ihanwen99 commented 1 year ago

Dear team@yurishkuro, I am a master's student from Shanghai Jiao Tong University and TUM. I have great interest in this project. Do you still accept students to apply to GSoC with this project? Thank you very much.

yurishkuro commented 1 year ago

We're working on GSoC timeline, applications are open until Apr 4

siddharthsingh025 commented 1 year ago

Hey @yurishkuro , I’ve summited my proposal at GSoC platform. Do you mind if I will send you proposal link to your Slack so that we can discuss there . Looking forward for your guidance !! 😃

yurishkuro commented 1 year ago

@haanhvu will be working on this as part of GSoC

nextrevision commented 1 year ago

Is one of the goals of this resulting integration to have compatibility with the otel exporter (https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/clickhouseexporter)? It's referenced above, but is that as a purely technical reference or schema compatibly goal?

yurishkuro commented 1 year ago

@nextrevision it's TBD. There is a blog post from ClickHouse criticizing some design choices in OTEL exporter, we are taking this into consideration.

nextrevision commented 1 year ago

Understandable, thanks for the link and looking forward to trying it out

GetRohitansh commented 1 year ago

Is this issue open or nearing its completion, I would like to contribute

haanhvu commented 1 year ago

Is this issue open or nearing its completion, I would like to contribute

We finished the first stage of benchmarking and making design decisions. We'll publish the benchmark report soon.

egege commented 4 months ago

Now in the middle of 2024, what is the progress of the mission? I want to be involved.

jkowall commented 4 months ago

This will be officially supported in jaeger v2 which is due in beta before the end of the year. There are lots of good things coming with v2, you can learn more about it from the last Kubecon presentation we did : https://www.youtube.com/watch?v=WNfesi_T0Bs