samsara / trackit

TRACKit! A developer friendly wrapper for Yammer's Metrics library in Clojure.
Apache License 2.0
8 stars 1 forks source link

Overlap and differences between trackit and ulog #6

Closed zcaudate closed 4 years ago

zcaudate commented 4 years ago

I've just started playing around with ulog and then noticed that you put up https://github.com/BrunoBonacci/my-projects with a disclaimer for trackit - "If you thinking to use this, maybe you should check μ/log out first!"

What are the overlaps between these projects and if/how they are typically used together?

BrunoBonacci commented 4 years ago

Hi Chris, it has to do with the difference between events and metrics. samsara/trackit is built on top of Codahale / Metrics library which in turn was built in a time when Graphite was the best you could hope for to visualize application metrics. Metrics are good, but they are not the full story. The metrics approach (as of the Metrics library) is to have small and efficient collection of counters, gauges and timers within a system, prepare some local aggregation and send partially aggregated data.Here is my problem. Any type of aggregation you are going to do locally (a single VM) will involve loss of information (and resolution).

For example, let say that you want to track some metrics about HTTP requests reaching your system, you could probably write a ring-wrapper as follow:

(defn metrics-wrapper
  [handler]
  (fn [r]
    (track-rate "myapp.api.requests")
    (handler r)))

This will generate counters, and make averages withing the single VM and then sen these across to a centralized monitoring system. Especially if you use timers and you get averages and percentile local to your system and then you send them to a centralized monitoring system. That system needs to aggregate the already pre-aggregated averages with more averages.Now, what do you get when you average a bunch of averages??? Answer: a big mess! (https://math.stackexchange.com/questions/95909/why-is-an-average-of-an-average-usually-incorrect) That's just one side.

The biggest issue is that you lost the possibility to ask more information about your requests. You lost the ability to slice the pool of requests by other available dimensions.

For example, if you wanted not only to track the rate of the overall requests but you wanted to track the rate of successful and error request you will have to add another rate-tracker. What if you want to track the requests by response code.It seems a reasonable request. What about track rate of requests by response code by endpoint (URI).By now you get the gits. Every time you aggregate the data locally and you discard the context of the data-point you want to track you just lost valuable information.

In contrast with μ/log you have as follow:

(defn events-wrapper
  [handler]
  (fn [req]
    (let [resp (handler req)]
      (μ/log ::http-request
             :uri (get req :uri)
             :request-method (get req :request-method)
             :content-type (get-in req [:headers "content-type"])
             :content-encoding (get-in req [:headers "content-encoding"])
             :http-status (get resp :status)))))

This will send records which looks like this

{:mulog/event-name :my-app.core/http-request,
 :mulog/timestamp 1573056913624,
 :mulog/namespace "my-app.core",
 :content-encoding "gzip",
 :content-type "application/json",
 :http-status 400,
 :request-method :post,
 :uri "/v1/events/",
 :puid "1ix2qq195ulno17drsohsk0l2t",
 :app-name "my-app",
 :env "PRD",
 :version "0.1.0"}

the last 4 properties can be injected with μ/set-global-context!

The point is, just with this few info you can ask a huge amount of questions to your system.  For example you could ask:

The possibility to add more dimensions (properties) and use them for narrowing your queries is such a powerful solution that it cannot be easily replaced.

Pretty much every company on AWS nowadays has an ElasticSearch cluster for the logs. We spend so much effort and time to extract useful information out of the logs when we could just send data. An entire industry was born for this purpose. My question is why did we encode the data as a string message in the first place?ElasticSearch and Lucene are incredible software. I think that our industry doesn't fully appreciate the power of these tools.

If you pipe the μ/log events into ElasticSearch (instead of sending boring messages) you now have a powerful, flexible and incredibly fast query engine which allows you to extract all the information that you need out of your data.You can visualize these events, you can set alarms, integrate with PagerDuty-like systems and when you need you can dig deeper to check whether these errors you are seeing in your api are due to a particular misuse of the API or a systemic issue.

If ElasticSearch is not of your liking your can send the same data into Kafka and use streaming technology like KSQL or Riemann to aggregate the data in realtime, or dump everything into a S3 bucket and use analytics systems to analyze the data.

To come to your question:

"What are the overlaps between these projects and if/how they are typically used together?"

The answer is simple: with μ/log you can get the same data that you get with samsara/trackit but not the other way round!!!

When you have the raw data, the possibility to slice/dice and aggregate the data in a different way is limitless.

I'm planning a lot of work on  μ/log , here some of the things I will add

I hope that I managed to convince you about the power of μ/log and its approach.I'll be happy to hear your thoughts on this.

regards Bruno

zcaudate commented 4 years ago

Hi Bruno,

Thanks for the fantastic explanation. I get it now and will concentrate on learning and using u/log. To be honest, I don't have much experience with logging or designing logging solutions as most stacks I've worked on have been predetermined so it's a good opportunity to really get to know what exactly is happening.

u/trace sounds fantastic as well and on first read I really like ring-boost because it solves a critical problem for me. I currently wish to have an easy way to benchmark endpoints and to progressively swap out the slower ones. I'm working with reitit at the moment and it doesn't have a metrics type module so I started looking into it. You've convinced me that a general logging/eventing system is easier to maintain if used well.

Chris