jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
19.83k stars 2.36k forks source link

Additional storage backends #638

Open yurishkuro opened 6 years ago

yurishkuro commented 6 years ago

Opening this issue to keep track of other related issues.

Relevant issue: plugin support #422 (done).

nbettiol commented 6 years ago

Did you remove the flags for elasticsearch in jaeger-collector? Because I'm doing a test using the image docker, which version is:

{"gitCommit":"dbd5db721fc59431b1e64874cc7d6265d89ec917","GitVersion":"v1.1.0","BuildDate":"2018-01-08T21:56:21Z"}

and I cannot see the elasticsearch flags.

black-adder commented 6 years ago

It looks like you're using latest instead of 1.1. We recently moved around some of the flags so that we can support plugins better https://github.com/jaegertracing/jaeger/pull/625. Using latest, you have to instead use env variable SPAN_STORAGE=elasticsearch to use the elasticsearch flags. I'd recommend that you use 1.1 since this change will be apart of 1.2 and will be documented at that time.

nbettiol commented 6 years ago

Thanks for the reply, yes I was using the latest version. I will use the 1.1

fzakaria commented 6 years ago

I would love to see a SQL option (whatever ANSI SQL that will be least vendor lock-in). Setting up Cassandra / ElasticSearch might be too ambitious for projects that want distributed tracing but honestly don't have the TPS to warrant a distributed datastore.

ringerc commented 6 years ago

Since I work with PostgreSQL, I sure wouldn't complain. But honestly I'm not sure a SQL db is an optimal store for largely free-form metrics of this nature. PostgreSQL at least offers the jsonb type for indexable free-form data. If you're trying to do this in a vendor neutral way you'll land up with your own json blobs, or doing EAV, and both of those are terrible. ANSI SQL is a poor fit for variable-structured or key/value form data and you'll need some vendor extensions to get usable performance.

But you inevitably land up with someone putting an ORM on top to "abstract" the DB. Then the ORM performs terribly, gobbles memory and everyone says "the SQL backend is slow, use instead".

pavolloffay commented 6 years ago

Related issue to this one is https://github.com/jaegertracing/jaeger/issues/551. Upvote if you are interested in it.

SwarnimRaj commented 6 years ago

New related issue- Files - https://github.com/jaegertracing/jaeger/issues/894

wy100101 commented 5 years ago

We are looking at using BigQuery as a storage layer. Presumably this could work with a SQL storage option. SQL can be a generic way to deal with columnar data stores in a generic way. I would complain about a BigQuery specific solution, but I think there is a place for generic SQL interface beyond RDBs.

yurishkuro commented 5 years ago

I assume that even if some database can be treated as SQL and accessed via standard database/sql API, we still need to statically import the actual driver. Granted, this may be less maintenance than a dedicated SpanStorage implementation. However, now that the protobuf model has been merged, nothing is blocking us from moving on the storage plugin dev, eg using something like harshicorp grpc plugin framework.

isaachier commented 5 years ago

Our model is sufficiently simple to warrant looking into using an ORM to support a large number of backends. I'll take a look at what's available. Reread above and understand what @yurishkuro means.

bruth commented 5 years ago

Giving my two cents.. an ANSI SQL could work for small workloads, so may be useful for lower-throughput applications that still want to benefit from this tool.

I will also throw out there that Timescale (a Postgres extension) may be a good fit for the required high write throughput.

mcarbonneaux commented 5 years ago

Clickhouse are SQL high performance storage very efficient for log and trace storage and whold be perfect storage alternative to cassandra original one... they are a true column db... distributed...compressed...

they are near to the CQL (sql like query language)... they use an SQL like language to...

https://clickhouse.yandex/

chvck commented 5 years ago

I just thought that I'd drop something here to say that there is also support for using Couchbase as a storage backend (via the grpc plugin), currently at https://github.com/chvck/couchbase-jaeger-storage-plugin. Will likely move to the couchbase-labs organisation in time.

omerlh commented 4 years ago

Has someone started to work on Azure CosmosDB integration? It has support for Cassandra API, but I couldn't manage to make it work...

rleiwang commented 3 years ago

I just created an issue proposing Chronowave as storage backend. https://github.com/jaegertracing/jaeger/issues/2534

DjinNO commented 3 years ago

What about ClickHouse? Clickhouse is very cool

jpkrohling commented 3 years ago

What about ClickHouse? Clickhouse is very cool

It's already linked in the issue's description, but here's the tracking issue for it: #1438

robross0606 commented 3 years ago

What about Apache Solr?

robross0606 commented 3 years ago

With the recent changes to ElasticSearch licensing, this just because SUPER important.

jpkrohling commented 3 years ago

The ES changes do look worrying, but not sure this justifies supporting Solr. It does help the case of advancing with some other storage.

yurishkuro commented 3 years ago

AWS announced an Apache-2 licensed fork of ES, logz.io also said something similar (could be the same effort). So I don't think there's reason to panic.

I don't think Elastic changed the terms for the Go driver which we started using in OTEL-based collector, but we'd need to watch for that.

jpkrohling commented 3 years ago

So I don't think there's reason to panic.

Absolutely, especially because they can't change the license for something that was released already. So, folks currently using ES don't have a reason to change immediately. If they need to update for some reason, like due to a security problem or general bug fix, then it might become problematic.

While I appreciate the work that logz.io is doing in this front, having multiple sources of ES doesn't help us in supporting our users. On day 1, they'll all be compatible among each other, but each fork will tend to follow its own path over time. Meaning: we'd need to decide which one we'll be "officially" supporting.

jkowall commented 3 years ago

We are working with several organizations including AWS on an open source version of ES and Kibana which will be Apache licensed and hopefully part of the ASF. I will know more in the coming days.

I've had a hard time getting RedHat engaged (they have canceled twice now) so if you can help with that @jpkrohling then we'd love it :)

Anyone interested in contributing or taking part in the community is welcome. I am collecting info here to start: https://docs.google.com/forms/d/e/1FAIpQLSfykAk4Bhc-dhjR0AXFP7T2oFmsLUxONbD6NwmgMz4usXSGkw/viewform?usp=sf_link

muhammadn commented 3 years ago

This is the only issue that is not closed. #2633 is closed so i am announcing my work to support s3 for jaeger as a plugin. We needed a cheap way to store data and elasticsearch is expensive for us (whether managed - price or self-managed - on the operational perspective)

I took the work done by the amazing team at Grafana for their loki code and use that to write an s3 plugin. It's still a WIP and it stores indexes locally and syncs chunks of data on s3. (I am seeing indexes being synced as well).

Data stored is in boltdb format and shipped to s3 via cortex. (i am leveraging on loki's codebase i am not repeating doing other people's work). Cortex is also used to query the data from s3 using PromQL (which i will implement on the plugin's reader)

a side effect of this work is support for Google's GCP, Amazon DynamoDB and Google's BigTable as well.

More info here: https://grafana.com/docs/loki/latest/operations/storage/boltdb-shipper/

jaeger-s3 code: https://github.com/muhammadn/jaeger-s3/tree/develop

Also you will need my fork of jaeger, which is a small change for the file flag at https://github.com/muhammadn/jaeger-s3/tree/develop

Note: It's working on both postgresql and now it can upload to s3. so two things in parallel for me to debug in postgresql to better understand the data. But the postgresql code will be removed once i get both writer and reader working.

Contributions are welcome!

jkowall commented 3 years ago

Interesting idea, good luck with the work @muhammadn. Remember, you'll have to figure out how to support querying for the data. My understanding of loki is that you can't do a free text search easily since the data is not all indexed. How will you support the querying of data with this new storage support?

muhammadn commented 3 years ago

Thanks @jkowall ! Conversations like this is what motivates me. 🙌.

I've managed to pull out the data from s3 through cortex by querying it using promql (basically logql which is based on promql) - You're right that the data indexed only are the time and labels and i can make a search using the labels. Labels are the only thing you can do a search on, though may not as flexible as SQL or ES.

Screenshot 2021-04-15 at 10 24 14 PM Screenshot 2021-04-15 at 10 16 21 PM

But i am thinking if it's better to go to Grafana's Tempo as i found out about it. But that's a discussion i would make in Tempo's project.

jkowall commented 3 years ago

The creator of Tempo @joe-elliott is a maintainer here at Jaeger and is also an engineer with Grafana Labs. We had this discussion, but right now Tempo cannot support the types of queries the Jaeger UI does today. I know that may change, at which point Tempo would be a good Jaeger backend, but today it cannot do everything necessary.

This is the compromise with something that doesn't do full text indexing versus selective queries or allowing queries based on a specific trace or short timeframe. The cost benefit of using non-indexing technologies are there, but flexibility is not the same.

muhammadn commented 3 years ago

@jkowall I've stabilised the code without any crashes now and I can query/see the data from Jaeger UI. 🎉 So moving forward i will test it out on production (using k8s) and see how it goes from then on. Thanks @jkowall for your answers! 🙌 Also thanks to @yurishkuro for adding plugin support as without it, it would not be possible for me to write the code for s3 support.

I've watched @joe-elliott 's video. Tempo is limited to searching TraceID but with jaeger-s3, searching is limited to labels and there are many label variables that i can use to construct the query.

Discussions on jaeger-s3 plugin will be on my repo, thanks all!

jkowall commented 3 years ago

Cool! Nice job @muhammadn I'm curious how the performance is with a search using S3 in that manner. Are you planning on deploying or using this setup?

yurishkuro commented 3 years ago

@muhammadn I re-opened #2633 and linked your repo at the top. Suggest moving discussions/updates there.

em135 commented 3 years ago

@Xitric and I are working in on a storage backend for Humio using the grpc plugin. The repository can currently be found at https://github.com/em135/humio-jaeger-plugin. I have opened an issue for this: #3005

galan commented 3 years ago

If S3 is supported, I would suggest to support GCS as well, which is also an objects-storage for all Google Cloud users. For our use-case that would be tremendous helpful!

muhammadn commented 3 years ago

@galan it does actually, despite the name is jaeger-s3, i had already added support for GCS (and Azure Storage) for quite some time but i have not tested it.

Maybe you can go to https://github.com/muhammadn/jaeger-s3 to try it out.

Related code to GCS: https://github.com/muhammadn/jaeger-s3/blob/main/config/config.go#L26 https://github.com/muhammadn/jaeger-s3/blob/main/s3store/store.go#L50

All you need is to modify the configuration for jaeger-s3 to this:

https://grafana.com/docs/loki/latest/operations/storage/boltdb-shipper/#example-configuration

which has the GCS config.

I will update the documentation to jaeger-s3 include GCS and Azure as well. (and probably change the project name entirely)

Do tell me if you need help. But i think we can move this discussion to #2633

@galan Update: I have updated the documentation - https://github.com/muhammadn/jaeger-s3/blob/main/README.md

Also just a question from the community, should i rename this project as a more generic name rather than jaeger-s3 since this plugins will support GCS and Azure as well?

qiansheng91 commented 2 years ago

@jpkrohling The Alibaba cloud log service has supported the jaeger, and here are the gifs of the plugin. Link: https://github.com/qiansheng91/jaeger-sls#quick-start

evilrussian commented 2 years ago

Can i use other jaeger collector as backend for my jaeger collector?

pavolloffay commented 2 years ago

Jaeger collector cannot send data (e.g. over gRPC) to other jaeger collector. However this capability is supported with OTEL collector.

nitinsaprumaersk commented 2 years ago

@yurishkuro Would highly recommend to add Azure Table Storage as a backend storage option as well.

arajkumar commented 1 year ago

@yurishkuro Could you please add PostgreSQL with Promscale into the list.? Now Promscale is Jaeger storage complaint too :) Thanks.

nicolastakashi commented 1 year ago

@yurishkuro could you please add RediSearch to the list? I've worked on a GRPC Plugin to Store Traces on Redis Search and this is close to the first release, I'm just adding a few performance tests.

https://github.com/nicolastakashi/jaeger-redisearch

coverthesea commented 1 year ago

Zinc https://github.com/zinclabs/zinc

vemula-anu commented 1 year ago

i use jaeger with timescaledb in kubernetes {"level":"info","ts":1677823818.9793034,"caller":"querysvc/query_service.go:137","msg":"Archive storage not created","reason":"archive storage not supported"} i use this configurations spec: containers:

yurishkuro commented 1 year ago

@vemula-anu please do not post support questions to this issue, create a new question in Discussions.

diondew commented 1 year ago

Could you please add Yugabyte to the list? https://github.com/jaegertracing/jaeger/issues/4354

jkowall commented 1 year ago

Could you please add Yugabyte to the list? #4354

Added, sorry for the delay.

paulgrav commented 8 months ago

The creator of Tempo @joe-elliott is a maintainer here at Jaeger and is also an engineer with Grafana Labs. We had this discussion, but right now Tempo cannot support the types of queries the Jaeger UI does today. I know that may change, at which point Tempo would be a good Jaeger backend, but today it cannot do everything necessary.

I know Tempo today is a lot more capable in terms of its search compared to back in 2021. Is it now a potentially good Jaeger backend or are there still gaps?

joe-elliott commented 7 months ago

Tempo has a broader set of search capabilities (via TraceQL) than Jaeger search. So, generally, Tempo could be used to back Jaeger. There are two gaps I'm aware of.

First, Jaeger search currently returns the entire trace to the frontend which then renders the search results pane you see in the UI. Tempo, however, only returns metadata to the frontend. I don't believe this metadata is enough to render every element of the Jaeger search results. For instance, I don't think you could list the services in the trace.

Second, we currently only retrieve auto complete tags from recent traces. So if you were searching a time range from yesterday the auto complete would still be based on the traces received in the last half hour or so. We are working to address this.

muhammadn commented 4 months ago

@paulgrav

Just what @joe-elliott explained but i want to add that it had been already done. I had completely overhauled jaeger-objectstorage to use tempo as a backend to store traces to AWS S3/GCS/AzureBlob since i believe tempo is already mature to support multiple cloud storage providers. Back in the early days of tempo there wasn't support for GCS and AzureBlob so we used loki as the interface to store trace data.

I've posted in the forums on how it would look like so you can take a look.

The codes are already published (both the tempo fork and jaeger-objectstorage) but the documentation needs more polishing.

jiajiayang commented 1 month ago

Is it possible to use loki as a back-end store, which can be a good correlation between logging and tracing?

jkowall commented 3 weeks ago

Is it possible to use loki as a back-end store, which can be a good correlation between logging and tracing?

Unfortunately, no @jiajiayang as loki is a logging system, however Tempo is the tracing backend and doesn't support full text search which is how Jaeger does the querying you see in the search dialog box.

Even in Grafana stack you still run multiple backends.