samstarling / finagle-prometheus

Provides a bridge between Finagle and Prometheus metrics
https://samstarling.co.uk/projects/finagle-prometheus/
MIT License
30 stars 18 forks source link

Switch PrometheusStatsReceiver default timer to JavaTimer #28

Closed Fluxx closed 6 years ago

Fluxx commented 6 years ago

When using PrometheusStatsReceiver in resources/META-INF/services/com.twitter.finagle.stats.StatsReceiver, we were getting a null pointer exception:

SEVERE: LoadService: failed to instantiate 'com.samstarling.prometheusfinagle.PrometheusStatsReceiver' for the requested service 'com.twitter.finagle.stats.StatsReceiver'
java.lang.NullPointerException
    at com.twitter.finagle.util.DefaultTimer$.toString(DefaultTimer.scala:94)
    at java.lang.String.valueOf(String.java:2994)
    at java.io.PrintStream.println(PrintStream.java:821)
    at scala.Console$.println(Console.scala:267)
    at scala.Predef$.println(Predef.scala:393)
    at com.samstarling.prometheusfinagle.PrometheusStatsReceiver.<init>(PrometheusStatsReceiver.scala:28)
    at com.samstarling.prometheusfinagle.PrometheusStatsReceiver.<init>(PrometheusStatsReceiver.scala:13)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

I did some debugging, and it turns out that at the time the argument-less new PrometheusStatsReceiver() constructor is called, both DefaultTimer is null. I'm not super familiar with how/when the Java service discovery stuff works, so I'm not exactly sure why it's null.

In any event, switching the timer to a new JavaTimer instance seems to fix it. I'm also not sure on if JavaTimer is the best timer to use -- it's the one I'm most familiar with and have used in various projects with success. But let me know if you have a different idea.

Also, I'm not sure how to test this behavior. In running the test suite, the service loader logic does run and the NullPointerException was raised, but no test is really able to exercise that behavior I think? I'd like to add a test, so if you know a way to do that, let me know.

Fluxx commented 6 years ago

I also believe this may fix #24 -- or at least some of the similar issues I and others mentioned in that ticket.

samstarling commented 6 years ago

Hey @Fluxx – thanks for this! Someone else was indeed having the same problem, so you’re not alone. I don’t have immediate answers to your questions, but let me take a look at this tomorrow and get back to you.

Fluxx commented 6 years ago

Sound good. Thanks for the quick reply!

samstarling commented 6 years ago

The Timer comes in through service loading. I'm wondering if there isn't an appropriate timer on your classpath, but that sounds odd, so I've also raised it in the Finagle Gitter channel.

Maybe you could try running LoadService[ServiceLoadedTimer]() in your application to see whether it returns an empty list. Here's how I do it in the tests. What's strange is that Finagle seems to fall back to a new JavaTimer(), so I'm rather puzzled as to why this is happening...

Fluxx commented 6 years ago

I tried calling LoadService[ServiceLoadedTimer]() inside of the PrometheusStatsReceiver constructor, and got this error:

.../finagle-prometheus/target/scala-2.12/classes...
[error] /Users/jeffpollard/Code/finagle-prometheus/src/main/scala/com/samstarling/prometheusfinagle/PrometheusStatsReceiver.scala:25: trait ServiceLoadedTimer in package util cannot be accessed in package com.twitter.finagle.util
[error]   println(LoadService[ServiceLoadedTimer]())
[error]                       ^
[error] one error found
[error] (core/compile:compileIncremental) Compilation failed
[error] Total time: 13 s, completed Jul 12, 2018 10:51:47 AM

To be clear the NullPointerException is happening not in my own application code, but if I run the test suite for finagle-prometheus.

Fluxx commented 6 years ago

Just checking in on this.

samstarling commented 6 years ago

Sorry: I was on holiday over the weekend and didn't take my laptop with me. I'll try and take a look later. Just to confirm, which test is failing for you with the NPE – anything related to PrometheusStatsReceiver?

Fluxx commented 6 years ago

No test is failing, the entire suite is passing actually. Using the no-argument constructor of the PrometheusStatsReceiver seems to work just fine once the environment is loaded and the test suite runs.

The NPE occurs for me on startup when the service loader attempts to load the PrometheusStatsReceiver. This happens when I add the PrometheusStatsReceiver as a loaded StatsReceiver in my application, as well as when I run the test suite.

samstarling commented 6 years ago

OK – that makes more sense now. I really don't know why the ServiceLoadedTimer isn't behaving well, and I'd have to defer to those who know more about that (and Finagle). What I can do though, is try and detect when the DefaultTimer.getInstance fails, and fall back to a Java timer.

samstarling commented 6 years ago

I've actually just swapped this out for a com.twitter.finagle.util.HashedWheelTimer, so I'm going to close this PR, and release a new version.