jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
19.83k stars 2.36k forks source link

SURVEY: Who is using Jaeger #207

Open badiib opened 7 years ago

badiib commented 7 years ago

Hi, you are in a group of individuals who have create or commented on issues in the Jaeger repository and we are doing a simple informal survey about Jaeger usage. If you could answer the following questions, it would be very valuable to gauge interest in the project:

Also consider adding your organization to ADOPTERS.md.

@jkandasa @sunfaces @jbdalido @princeop @pavolloffay @mabn @jpkrohling @nlamirault @JodeZer @prestonprice57 @jrbury @objectiser @sloev @hwinkel @Madhu1512 @yuekui2 @valichek @dianvaltodorov @ZhouZiHe @LoungeFlyZ @jeluard @diegofernandes @d-ulyanov @jyothepro @yqf3139 @tomersimis @ruinanchen @szdavid92 @anuptalwalkar @hekike @sul4bh @Strandedpirate @julianste @awhiteside @nklmish @sweatybridge @kevinearls @felixbarny @hzariv @nlamirault @longXboy @drzero42 @xdralex @philipgian @bharat-p

JodeZer commented 7 years ago
yqf3139 commented 7 years ago

If applicable, what company/organization do you represent?

I am a contributor to fission, which is a FaaS solution on top of Kubernetes.

How are you using Jaeger? E.g. full production deployment, considering, experimenting, or I am not using Jaeger etc.

We need to integrate a distributed tracer for two usage:

Currently I am doing some experiments on the integration.

If you are not using Jaeger, why not?

Will find myself some time to try Jaeger. It seems Jaeger has better client library support.

How many services (or microservices) exist in your system layout?

Around 10 microservices. Excluding user functions, which are also services evolving over time.

nlamirault commented 7 years ago

I work for a subsidiary company of Orange.

We are experimenting OpenTracing in a futur API Gateway services.

We use Jaeger using Kubernetes deployment.

Around 10 services.

Cassandra.

codefromthecrypt commented 7 years ago

Interesting to hear folks say jaeger has a better client library, especially as Jaeger is OpenTracing which is supposed to make that point moot between systems. If anyone cares to elaborate on which library (at least language) is being compared and what they like better, that'd be interesting feedback, too.

jbdalido commented 7 years ago
JodeZer commented 7 years ago

@jbdalido glad to see scylladb !

bharat-p commented 7 years ago
hzariv commented 7 years ago

Also integration with mesh service proxy such as Envoy or Linkerd is important to us.

xdralex commented 7 years ago

If applicable, what company/organization do you represent? Stitch Fix

How are you using Jaeger? E.g. full production deployment, considering, experimenting, or I am not using Jaeger etc. Considering/experimenting

If you are not using Jaeger, why not? The environment for which we are considering Jaeger is mostly Python 3, so waiting either for this pull request to be merged or an alternative implementation :)

mabn commented 7 years ago

If applicable, what company/organization do you represent? Base CRM

How are you using Jaeger? Experimenting in production - there's a process which listens on kafka to our custom traces, converts them and publishes to jaeger.

If you are not using Jaeger, why not? Traces with ~1M spans make jaeger hard to use, have to deal with it first. As for instrumenting services with opentracing - this will take time, only 1 service has it so far.

How many services (or microservices) exist in your system layout? 100+

Storage We're using AWS managed Elasticsearch - mainly because it's managed, but also because we have experience with ES and not with Cassandra. I'm still trying to make it work properly though - right now (2017-09-22) it performs poorly and drops a lot of spans because indexing does not use bulk API, indices are created without index.translog.durability=async and AWS ES requires signing of each index so there's additional proxy to go through.

hekike commented 7 years ago

If applicable, what company/organization do you represent? RisingStack

How are you using Jaeger? Experimenting with automatic instrumentation for Node.js: https://github.com/RisingStack/jaeger-node

If you are not using Jaeger, why not? Node.js async_hooks is still in experimental phase. Currently, our own tracing is more feature complete: http://trace.risingstack.com

How many services (or microservices) exist in your system layout? 50+ (our product's backend)

pvlugter commented 6 years ago

Lightbend has OpenTracing integration for Akka (and this is being extended to more Lightbend technologies, such as Akka HTTP, Play, and Lagom). Many of our customers are interested in tracing for distributed systems or microservices. The Jaeger client is used as the default OpenTracing client to report to Jaeger or Zipkin, giving our customers the option of using Jaeger.

Dieterbe commented 6 years ago

If applicable, what company/organization do you represent?

GrafanaLabs

How are you using Jaeger?

currently prototyping an implementation for our tsdb with the goal of validating performance and suitability and then taking to production. potentially we may add opentracing to our other software (like Grafana) as well. our most urgent need was just getting rich, context-specific distributed logging in place so we can diagnose performance trouble and jaeger looks like a good fit. In particular compared to "just distributed logging" systems like ELK/crate or oklog, we realized we want tracing not just logging.

How many services (or microservices) exist in your system layout?

We have about 20 different projects that we run, but many of them run them multiple times (many of our customers have a dedicated single-tenant deployments in kubernetes)

UPDATE sept 22 we're now using jaeger in prod for 2 different projects (each running hundreds times due to multi-tenancy) and we're also working on adding jaeger support into grafana itself.

backend: cassandra

frankgreco commented 6 years ago

If applicable, what company/organization do you represent?

Northwestern Mutual

How are you using Jaeger? E.g. full production deployment, considering, experimenting, or I am not using Jaeger etc.

I developed Kanali which we use to proxy all production traffic in our Kubernetes clusters. Kanali integrates with Opentracing to provide end to end distributed tracing. I love the Jaeger project as it is the most robust and clean UI for Opentracing IMHO

How many services (or microservices) exist in your system layout?

We currently use Jaeger to visualize tracing for 100s of microservices. These traces are used by 1000s of developers every day.

otisg commented 6 years ago

Interesting to hear folks say jaeger has a better client library, especially as Jaeger is OpenTracing which is supposed to make that point moot between systems.

@adriancole I think people say this because OpenZipkin doesn't seem to have OpenTracing compatible Python or Node tracer, only Java and Go or, if it has, it's not immediately obvious.

ejwood79 commented 6 years ago

If applicable, what company/organization do you represent?

Under Armor

How are you using Jaeger? E.g. full production deployment, considering, experimenting, or I am not using Jaeger etc.

Limited production deployment, expanding.

How many services (or microservices) exist in your system layout?

100s.

jnewmano commented 6 years ago

396 @black-adder

  1. If applicable, what company/organization do you represent?

Weave

  1. How are you using Jaeger? E.g. full production deployment, considering, experimenting, or I am not using Jaeger etc.

    Full production deployment across both Kubernetes and virtual machines. Using OpenTracing+Jaeger with Cassandra for storage.

  2. How many services (or microservices) exist in your system layout?

    100s of microservices

Dieterbe commented 6 years ago

Am I the only one who finds "How many services (or microservices) exist in your system layout?" an ambiguous question? I don't understand if this means the amount of unique software projects, or the amount of daemons running (where you count all copies of the same service running)

frankgreco commented 6 years ago

@Dieterbe I take service to be a unique microservice. A good analogy would be a Kubernetes service.

pavolloffay commented 6 years ago

Hi all, @jnewmano @ejwood79 @otisg @frankgreco @Dieterbe @pvlugter @hekike @mabn @xdralex @hzariv @bharat-p @jbdalido @nlamirault @yqf3139 @JodeZer

could you also please mention which storage are you using? Whether Cassandra or Elasticsearch. Edit your comment or just comment below.

Thanks

ejwood79 commented 6 years ago

We’re using Cassandra.

Sent from my iPhone

On Sep 22, 2017, at 04:28, Pavol Loffay notifications@github.com wrote:

Hi all, @jnewmano @ejwood79 @otisg @frankgreco @Dieterbe @pvlugter @hekike @mabn @xdralex @hzariv @bharat-p @jbdalido @nlamirault @yqf3139 @JodeZer

could you also please mention which storage are you using? Whether Cassandra or Elasticsearch. Edit your comment or just comment below.

Thanks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

otisg commented 6 years ago

Elasticsearch here at Sematext

bigkraig commented 6 years ago

We're in experimentation phase at Ticketmaster. Hundreds of microservics that will need to be instrumented but after a few teams have started tracing interest is gaining.

B0go commented 6 years ago

If applicable, what company/organization do you represent?

https://github.com/ContaAzul | http://contaazul.com

How are you using Jaeger? E.g. full production deployment, considering, experimenting, or I am not using Jaeger etc.

We just deployed it to production in our Kubernetes cluster saving data to ElasticSearch on AWS (AWS Elastic Search Service)

How many services (or microservices) exist in your system layout?

~100 instances of ~ 50 services

benjigoldberg commented 6 years ago
golonzovsky commented 6 years ago
malkia commented 5 years ago

Q: How are you using Jaeger? A: I'm doing evaluation of several OpenTracing frameworks for C++. I had successes with both OpenTracing-cpp and OpenCensus-cpp. I still haven't evaluated Jaeger's C++ (todo). While doing this, I realized I needed viewer, and started with Zipkin's UI for the first few hours, though found some limitations, or maybe I'm putting too much information at the app (several thousand traces). At first I was avoiding Jaeger's UI, since I thought it was just specific to Jaeger itself (had to do an evaluation over a day, and would continue through the week), only to find that it supports ZipKin mode. I was pre-excited about seeing the screenshots, the nice timeline, the folding/unfolding, and visually it was pleasure to use (ZipKin's UI also looks nice, maybe there are things ZipKin's UI can do that Jaeger's can't.). At any rate, I'll continue using it, and keep on looking until I finalize my choice.

Q:How are you using Jaeger? E.g. full production deployment, considering, experimenting, or I am not using Jaeger etc. A: As I said not yet now, still evaluating, but the plan is to have metrics in our desktop app, that talks to few servers, eventually have these servers, and whatever they are fronted with also monitored, and find out what collection scheme/mechanism would be appropriate. It's an in house game level editor, used by hundreths of people from few different studios, and we already collect logs (elastic), but perf metrics are done on demand - by asking users to run XPerf and then we analyze through WPA (Microsoft tools). Additionally we collect crash dumps, but not using exceptions yet (C++). So something that unifies, or provides alternative information (also export aggregated metrics to prometheus/grafana, which OpenCensus can do, and maybe Jaeger too (need to start looking into it soon)). All in all, just trying to get the idea what's available right now.

Q: If you are not using Jaeger, why not? A: Still evaluating,

Q: How many services (or microservices) exist in your system layout? A: For our app we have to talk to one or more (edge) perforce servers, custom caching solution, spawn SN-DBS fxc.exe (shader compiler), eventually to a local Windows Service serving/processing assets, etc. But we also have heavy multi-threaded case using ConCRT (Microsoft's "lite" version of Intel's TBB in a way), plus std::thread, and even WinAPI style CreateThread()'s. I'm looking for ways to safely hook this (propage my context across), and there are some gotchas - like green threads/coroutines, and possibly using thread local with push/pop style to keep the "active" thread. How easy I can achieve this may dictate which of the API's I would use (I've also noted that OpenCensus may have some extra locks, hidden allocs per span creation, though this should not be a big deal, and seems fixable). So I'm very excited to go ahead and eval Jaeger, and report back.

--- I worked for Google for some time, and had to use dapper, occasionally look at rpcz, tracez, etc during my oncall duties (wasn't regular SRE, medium sized java team with mixed responsibilities). Since then I've loved the approach, and the genuine idea of distributed tracing, and trying to see whether it's going to bring benefit to us. I'm glad that the industry is moving in the right way, though the information is a bit like sparse, and I don't know yet all the players :)

isaachier commented 5 years ago

@malkia:

If applicable, what company/organization do you represent?

malkia commented 5 years ago

I speak only for my team, I don't know whether it's used in other teams/projects across the company, but my team is part of Activision's Central Tech.

isaachier commented 5 years ago

@malkia very cool. Thanks for answering that.

trondhindenes commented 5 years ago

If applicable, what company/organization do you represent? RiksTV, Norwegian broadcast distributor

How are you using Jaeger? Early days - we're using Jaeger in some backend python and .Net core apps.

If you are not using Jaeger, why not? Our majority of code is still on "legacy .Net", which is apparently difficult to Jaeger-enable. Usage will broaden as we transition to .Net core.

How many services (or microservices) exist in your system layout? 60+

Storage Self-managed Elasticsearch running in AWS.

caniszczyk commented 5 years ago

@trondhindenes thanks, added you here: https://github.com/jaegertracing/jaeger/pull/1121

zdicesare commented 5 years ago

If applicable, what company/organization do you represent? Vistar Media

How are you using Jaeger? E.g. full production deployment, considering, experimenting, or I am not using Jaeger etc. We are using Jaeger in an AWS-based stack for performance analysis and debugging in all envs. We annotate traces with business logic metadata as well.

We have the Jaeger infrastructure running in ECS and deployed via CloudFormation, the agents are deployed both in ECS and paired with ElasticBeanstalk applications.

How many services (or microservices) exist in your system layout? Less than 10, but this is increasing. We trace some services that are isolates and also are experimenting with tracing our builds (we use Bazel)

Storage AWS hosted ElasticSearch

Puneeth-n commented 5 years ago

If applicable, what company/organization do you represent? @Comtravo

How are you using Jaeger? Production on a subset of microservices.

If you are not using Jaeger, why not? Currently we are using Jaeger but considering Opencensus as it matures because what we really miss is good auto-instrumentation support for Node.js. We forked the auto instrumentation from RisingStack and fixed some small issues.

DataDog ships their own opentracing-api compatible tracer along with auto instrumentation which is cool.

How many services (or microservices) exist in your system layout? 26

Storage AWS ES

ThomWright commented 5 years ago

If applicable, what company/organization do you represent?

Candide @candide-eu

How are you using Jaeger?

Full production.

Running on GKE with an Elasticsearch backend hosted on Elastic Cloud.

We have the client library integrated into our NodeJS service shell library to automatically trace inter-service requests.

How many services (or microservices) exist in your system layout?

>30 k8s services in our prod environment. Most of them Jaeger-enabled.

clyang82 commented 5 years ago

elasticsearch in IBM Cloud Private with tls enabled

EaconTang commented 4 years ago

Q: If applicable, what company/organization do you represent? How many software engineers? A: Tencent TEG Infosec Department, about 300+ engineers.

Q: How are you using Jaeger? A: Full production deployment.

Q: How long have you been using Jaeger? A: Since May in 2019, has been about 4 months.

Q: If you are not using Jaeger but chose another tracing system, what were the reasons? A: We are using Jaeger.

Q: How many services (or microservices) exist in your system layout? A: At least 100 services.

Q: How many of them are traced? A: At least 10 services are traced, and this number would be about 80+ at the end of this year.

Q: Can you describe your tracing setup and volumes? I.e. which storage you use, how many traces/spans you store, etc. A: kafka+es, currently about 600 millions spans each day.

Q: What types of problems are you solving with tracing? A: We use Jaeger for monitoring health of rpc servers, analyzing root cause and drawing service topology.

d-ulyanov commented 4 years ago

Q: If applicable, what company/organization do you represent? How many software engineers? A: Ozon (e-commerce, marketplace), about 500 engineers.

Q: How are you using Jaeger? A: Full production deployment (either for Kubernetes + legacy non-Kubernetes services). Our setup of Jaeger is strongly modified and most of the components have been rewritten (except for UI, see details below)

Q: How long have you been using Jaeger? A: ~1 year

Q: If you are not using Jaeger but chose another tracing system, what were the reasons? A: After several months of using Jaeger our developers asked us to add more advanced sampling policies to get more insights: priority sampling for traces with errors, long traces, etc. Probabilistic sampling was cool at the start but it provides too small possibilities when you're troubleshooting on production. Also, there was a question with logs - how to use span logs but avoid writing logs to 2 places. Finally, we've replaced Jaeger agent and collector by our implementation. Main features: tail-based sampling (traces with errors, traces with anomaly high time, etc.), keeping ALL traces in memory for 30m (searchable from Jaeger UI), Jaeger UI backend integrated with our logging system (it attaches logs to spans on-the-fly, so we're not writing span logs to Jaeger's ElasticSearch), building near-realtime dependency graph with RPS/RT for each edge.

Q: How many services (or microservices) exist in your system layout? A: >500 services.

Q: How many of them are traced? A: We've built "scratch" framework as the basement of any microservice that instrumented with metrics and tracing out of the box, so most of the services are well-instrumented (~95% of services are covered).

Q: Can you describe your tracing setup and volumes? I.e. which storage you use, how many traces/spans you store, etc. A: Setup:

Stats:

Q: What types of problems are you solving with tracing? A: We're using tracing for 2 main directions:

Thanks for Jaeger! And ask me if you're interested in any details :)

linjmeyer commented 4 years ago

If applicable, what company/organization do you represent? How many software engineers?

Redbox; ~50 Software Engineers, ~5 DevOps/Delivery Engineers

How are you using Jaeger? E.g. full production deployment, considering, experimenting, or "I am not using Jaeger" etc.

We are using Jaeger in production for all of our applications on Kubernetes, as well as a select set of non-Kubernetes cloud applications. All services are ASP.NET Core (C#). We use a managed ElasticSearch cluster with collectors across our cloud infrastructure to ensure we can perform end to end spans across multiple regions/cloud providers. For Kubernetes we are using the Jaeger Operator and Istio as a service mesh. All services being traced are using the Jaeger C# Client with our own wrapper library to add some additional features like logging the JaegerSpanId and adding Prometheus metrics for the internal Jaeger metrics. Most services are using the remote sampling configuration from the collector.

How long have you been using Jaeger?

Around 6 months, 3 months in production.

How many services (or microservices) exist in your system layout?

70+ Services/Microservices using various cloud providers and k8s.

How many of them are traced?

Around 30 services in both Kubernetes and non-Kubernetes cloud environments.

Can you describe your tracing setup and volumes? I.e. which storage you use, how many traces/spans you store, etc.

What types of problems are you solving with tracing?

We use Jaeger to observe and troubleshoot performance issues and to understand what service-to-service dependencies we have.

Betula-L commented 3 years ago

If applicable, what company/organization do you represent? How many software engineers?

bilibili;

How are you using Jaeger? E.g. full production deployment, considering, experimenting, or "I am not using Jaeger" etc.

We are using Jaeger in production for most of our applications on Kubernetes, as well as few of applications deployed on machine. We use Jaeger Agent and Jaeger Collector with little revise. Those two provide enough features in production.

However, we rewrite Jaeger SDK and Jaeger Job totally. In our experience, almost all of golang applications can use Jaeger for tracing easily for us, but others do not, i.e. Java, Python. Skywalking agent may be a better choice for trace collection, because applications can import jar more easily than a SDK.Maintaining tracing SDK for thousands of different language applications is a really painful job, especially for python. We hope find a painless way to manage that in the future.

How long have you been using Jaeger?

Around 1 years in production.

How many of them are traced?

1000+ Services/Microservices using various cloud providers and k8s.

Can you describe your tracing setup and volumes? I.e. which storage you use, how many traces/spans you store, etc.

We apply Clickhouse now, but used ScyllaDB before, where Elasticsearch performs bad in scalability and Cassandra/ScyllaDB is hard to do complex query for lots of situation.

We have 1million/s spans and save them 7 days, for troubleshooting performance issues and maintaining dynamic service-to-service dependencies.

zdyj3170101136 commented 1 year ago

如果适用,您代表什么公司/组织?有多少软件工程师? mihoyo; 你是如何使用 Jaeger 的?例如完整的生产部署、考虑、试验或“我没有使用 Jaeger”等。 we use agent->collector->kafka-> flink and ingester -> clickhouse.

we reimplement jaeger-agent: 1, use websocket to redirect []byte directly from client to collector. 2, use unix domain socket to replace udp.

您使用 Jaeger 多久了? I in charge of it for half of year. We had used at least 3 year. 如果您没有使用 Jaeger 而是选择了其他跟踪系统,原因是什么? 您的系统布局中存在多少服务(或微服务)? thousands. 其中有多少被追踪? ALL. 您能描述一下您的跟踪设置和数量吗?即您使用哪个存储,您存储了多少跟踪/跨度等。 use clickhouse to store at least millions of spans per second for 30 days. 您通过跟踪解决了哪些类型的问题? 1, service dependency graph. we use google's pprof to make display thousand's of service relation is possible and loop very good.

search with service, only show the service and it's up and down stream service. search with group, only show the group's service. connect service dependency graph with metric, a service node in graph do not only have it's name but also have the average latency, span count, error percent in time range.

and just like the google's pprof, our ui also have:

2, full sampling. After reimplement jaeger agent and replace agent thrift marshal/unmarshal protocol by more efficient protocol. We can sampling all trace.

3, high accuracy histogram. We use clickhouse as metric store, which make store histogram each service/operation with hundred time bucket possible, which would cost hundred of GB memory if using prometheus.

4, Critical path. show each span's truly execute time. we have different two ui to display:

5, Connect trace with runtime/pprof. We can connect a trace with runtime pprof, show a request's flamegraph. which func the request costed cpu.

6, Tail-based sampling. sampling span with p99 latency, error tag.

7, package instrumentation. elastic search, kafka, net/httptrace, mongodb, redis, grpc, sql.

8, explore. an ui which make we can:

lunkan93 commented 1 month ago

If applicable, what company/organization do you represent? Elastisys - Creators and maintainers of Compliant Kubernetes

How are you using Jaeger? Since early 2023, we have been offering Jaeger as a managed service to our customers for their distributed tracing needs, operated and maintained by us.

Which storage do you use? We deploy Jaeger with a dedicated OpenSearch cluster. This choice was based on a couple of reasons:

What types of problems are you solving with tracing? We saw an increased interest in a managed distributed tracing solution from our customers, as they wanted to gain deeper insight into their applications, or just a more complete observability stack in general.