jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.54k stars 2.44k forks source link

AI/ML platform for Jaeger #1639

Open pavolloffay opened 5 years ago

pavolloffay commented 5 years ago

Summary

At the moment doing ML/AI analysis with Jaeger is hard. There is no direct integration with ML/AI platforms and we do not have much knowledge on what models we could build.

Proposal

Placeholder issue for any discussion related to ML/AI integration with Jaeger. On the recent Jaeger bi-weekly meetings we have talked about doing ML/AI on tracing data (and also with combination with other telemetry data like metrics and logs).

For the completion, I will list existing ML/post-processing integrations:

cc) @annanay25

pavolloffay commented 5 years ago

As a first step, we should gather people interested in this to drive the right decisions. cc @jaegertracing/jaeger-maintainers

Secondly, we should start working on the integration to make it easy to start writing models. It seems there are two main WEB based platforms: jupyter and zeppelin. Both have pros and cons:

Jupyter

Zeppelin

pavolloffay commented 5 years ago

If you are interested comment on this issue or send me PM on gitter and I will add you to https://github.com/orgs/jaegertracing/teams/data-analytics

pavolloffay commented 5 years ago

Our next steps could be to try Jupyter with java/scala kernel and make connections to our DB/kafka.

annanay25 commented 5 years ago

After a discussion with Pavol, we have decided to work top-down by first compiling a list of high level objectives that we want to achieve using the AI/ML analysis. We could then gauge interest in the community about the most helpful features, and prioritise accordingly.

Once our targets are clear, not only will it help us define a clear path for development but also encourage contribution from folks with more knowledge on building data analysis models.

cc @jaegertracing/data-analytics

pavolloffay commented 5 years ago

Anybody is welcome to propose/upvote any feature which would help us with this initiative.

The objectives from the initial comments still hold. First we would like to build a community of people who would like to contribute (models, integrations), validate models. Secondly provide AI/ML integration as part of the upstream project. This should ultimately result in new features added to Jaeger main distributions and UI interface.

To be able to start working on the models we should provide an environment to do that. Specifically I am talking about Jupyter notebook integration with Jaeger. Provide a notebook with spark/flink connected to Jaeger data storages.

pavolloffay commented 5 years ago

@yurishkuro also proposed to create graph query language (similar to canopy's capabilities) which would allow defining graph related queries.

Screenshot of Jaeger Project Bi-Weekly Call - Google Docs

https://research.fb.com/publications/canopy-end-to-end-performance-tracing-at-scale

pavolloffay commented 5 years ago

I would like to hear @jaegertracing/data-analytics opinion on which language they would like to use for data analytics with Jaeger. Would it be Java or python?

yurishkuro commented 5 years ago

I would start with Python, it is the de-facto DS/ML language. We can later extend it to Java if necessary.

Talina06 commented 5 years ago

I would like to hear @jaegertracing/data-analytics opinion on which language they would like to use for data analytics with Jaeger. Would it be Java or python?

I prefer Python because it's easier to get off the ground, also as @yurishkuro mentioned, it provides good libraries for DS/ML purpose.

To be able to start working on the models we should provide an environment to do that. Specifically I am talking about Jupyter notebook integration with Jaeger. Provide a notebook with spark/flink connected to Jaeger data storages.

I would be interested in starting with the Jupyter integration with Jaeger. Maybe once we have this in place, gathering requirements for building models could be easier. What do you suggest @pavolloffay?

pavolloffay commented 5 years ago

Ack for python, so let's start this :).

@Talina06 this is great. I will try to summarize requirements I can think of:

  1. run the jupyter/jupyterlab as docker container

  2. have a notebook file with basic connector to the storage - e.g. Elasticsearch or do streaming with Kafka.

  3. the connector might depend on the framework we choose - we could start with spark or flink (both support python). I would like to also hear what people prefer here.

Talina06 commented 5 years ago

Ack for python, so let's start this :).

@Talina06 this is great. I will try to summarize requirements I can think of:

  1. run the jupyter/jupyterlab as docker cotainer
  2. have a notebook file with basic connector to the storage - e.g. Elasticsearch or do streaming with Kafka.
  3. the connector might depend on the framework we choose - we could start with spark or flink (both support python). I would like to also hear what people prefer here.

Sounds good. Let me get started with Spark in the meantime and share an update here.

VishvendraRana commented 5 years ago
  1. the connector might depend on the framework we choose - we could start with spark or flink (both support python). I would like to also hear what people prefer here.

@pavolloffay IMO, we should start working on connector for spark. Spark has a larger community base and is more widely used tool for data analysis..

yurishkuro commented 5 years ago

I would say we should start with the library/DSL for writing query and analysis, not with Spark integration. A library is useful on its own as there could be many different sources of traces it would work with in Jupyter, like loading from a file or from query-service.

pavolloffay commented 5 years ago

We can start simultaneously with both. Both could be useful for different use-cases.

I have created a separate issue for DSL https://github.com/jaegertracing/jaeger/issues/1811.

Issue for jupyter notebook https://github.com/jaegertracing/jaeger/issues/1813 - cc) @Talina06

yurishkuro commented 5 years ago

I replied in https://github.com/jaegertracing/jaeger/issues/1811#issuecomment-534848073

We don't need a full blown DSL, just a data model of a trace as a graph. Once we have that, people can start writing jupyter scripts.

pavolloffay commented 5 years ago

We should move our protos to ild repository and allow building to other languages:https://github.com/jaegertracing/jaeger/issues/1213. That will be required to consume data from Kafka.

pavolloffay commented 5 years ago

It would also help if the compiled model classes were published as artifacts. Now if somebody wants to consume data from kafka it requires a lot of additional work to be done.

pavolloffay commented 4 years ago

I have moved my POC with trace DSL using gremlin and packaged in jupyter notebook to https://github.com/jaegertracing/jaeger-analytics-java.

sergioarmgpl commented 4 years ago

I think that i can work in a MongoDB backend and to apply some AI with scikit learn or tensorFlow with collected data