openzipkin / brave

Java distributed tracing implementation compatible with Zipkin backend services.
Apache License 2.0
2.36k stars 714 forks source link

consider datasource-proxy as an alternative or replacement to p6spy #824

Open codefromthecrypt opened 6 years ago

codefromthecrypt commented 6 years ago

We currently have JDBC instrumentation via https://github.com/p6spy/p6spy. This has been helpful, but we have also received feedback it is harder to configure, so some don't use it. Given limited resources, we should think through carefully what to continue supporting and not.

There's an alternate library available that we didn't know at the time https://github.com/ttddyy/datasource-proxy It's 1.x codeline is compatibly with JRE 1.6 which means it could be used with all versions of code Brave supports. It also has a 2.x codeline and external adapters in progress, for example, https://github.com/ttddyy/datasource-proxy-r2dbc @ttddyy is active as well in these communities.

I'd like to get feedback from those who've tried either so we can chart a path forward. cc'ing some folks but feedback open to all:

Here are some users: @shakuzen @llinder @hypnoce @ewhauser @NegatioN @Sef135 @regardfs @Logic-32 @jorgheymans @nklmish @dmitry-at-genestack @anuraaga @koji @theopengroup @tangrui @gcnote @whx4J8 @dhwajad @elventear @suls @embs @mmanciop @Jaware

And here are some folks who write alternatives to brave and often give feedback: @felixbarny @tylerbenson @nicmunroe @wu-sheng @ivantopo

codefromthecrypt commented 6 years ago

Also, while the title includes a decision about whether to also carry-forward p6spy, interest in datasource-proxy is independent.

If you just want to express you want datasource-proxy support, put a thumbs up on the PR description screen shot 2018-11-08 at 9 03 33 am

ttddyy commented 6 years ago

Thanks @adriancole

I'd be very happy to hear about feedback.

For datasource-proxy 1.x (current release version), it's a bit aged programming style since it's been there for long time. I'm also writing one for r2dbc, and based on that, I'm putting the experience back to 2.x branch(java8 baseline).

Based on the feedback, I'll update datasource-proxy as well :)

codefromthecrypt commented 6 years ago

I've added a feature request to allow your 2.x codeline to be 1.6 bytecode. We can discuss there, and crazy as it sounds, we still have folks running old runtimes at least Java 1.7 but sometimes even 1.6. https://github.com/ttddyy/datasource-proxy/issues/55

zeagord commented 6 years ago

This would be super helpful. Currently we also faced some difficulties with p6spy.

wu-sheng commented 6 years ago

If we have already known datasource-proxy is better. You should encourage users to choose it. Also, I think JDK 1.6 compatible is important if it is still possible.

jorgheymans commented 6 years ago

Just wondering specifically which issues people were having with the p6spy integration ?

ttddyy commented 6 years ago

FYI:

@gavlyukovskiy has a project : spring-boot-data-source-decorator On the bottom of README, there are some picture with tracing.

In that project, there is a datasource-proxy listener implementation that creates spans. TracingQueryExecutionListener.java

gavlyukovskiy commented 6 years ago

More than that I even have version that works with brave - TracingQueryExecutionListener It's more or less the same as brave's implementation, but also creates spans for Connection lifetime and for time between fetching first row and closing ResultSet. I would be happy to move this listener to brave and just leave spring auto-configuration on my side.

codefromthecrypt commented 6 years ago

So there was a question about feedback and things leading to this issue. I won't mention routine glitches and things. I will answer other comments independently.

I'm not sure I can find all the feedback on gitter, but I sensed configuration has been tricky. I placed some of this below.

I think one of the major things that "made me look" was that @gavlyukovskiy implemented datasource-proxy alongside p6spy https://github.com/gavlyukovskiy/spring-boot-data-source-decorator. I noticed the api style differences, where datasource-proxy seemed to be configured with java types moreso than p6spy. I could follow it easier, and this seems more natural in java based DI tools.

This is subjective, and I'm not a user either!, but I have to say that p6spy and also mysql style configuration feels foreign as they tend to nest details in their own configuration such as separate files or embedded strings. It feels nicer to allow frameworks like spring boot or dropwizard to participate directly cc also @jplock

I could dig up more on various gitter channels, but there have been some interesting feedback on p6spy that isn't about routine api change etc. We don't know how well it would apply to datasource-proxy either! I haven't collected all of it, just looked on one channel and grabbed a few.

@jorgheymans Oct 30 2017 20:58

so for my tracing needs i was looking to filter out some of the spans p6spy is generating by configuring it's executionThreshold property. Turns out this is not applied to custom JdbcEventListeners (which Brave uses), bummer. logged p6spy/p6spy#418 in case anyone is interested i'm fully aware that even if they would expose their filter logic, most notably the executionThreshold feature would be not trivial for brave to implement. But at least we would have some options

@hypnoce Sep 11 2017 07:26

Hi @adriancole , I was trying to have multiple remoteServiceName for the p6spy module factory. When no name is defined, it takes the connection catalog. I have many different DB connection for the same service, so either I have a pretty functionally irrelevant db catalog name or the same remote service name for all DBs Do you think it's possible to add a property (zipkinRemoteServiceName) to the url connection ? jdbc:mysql://localhost:5555/mydatabase?user=user& zipkinRemoteServiceName=authenticationDB let me know if you have any thoughts on that. Thanks !

codefromthecrypt commented 6 years ago

@gavlyukovskiy and others.. one thing I would like to have is a clear path for folks to decorate our default database policy. For example, sometimes people want more data than is cheap to add. We have a lot of sites who want very tight client spans. I feel like datasource-proxy has some layering that would work nicely in spring boot, for example a list of interceptors.

This isn't to say p6spy doesn't, but it is definitely the case I'd like something simple that works for modern things like boot and also older apps like spring 3 XML, and use exactly the same code in both places.

Right now, we usually only have mysql examples as it is simpler to explain than p6spy (no offence, but it would at least always be simpler because of the lack of abstraction).

ttddyy commented 6 years ago

Regarding above ProxyDataSourceBuilderConfigurer.java, on datasource-proxy 2.x version I'm planning to remove logger specific APIs. (logQueryBySlf4j(), logQueryByJUL(), logQueryByCommons(), logQueryToSysOut())

Instead, allowing user to simply define custom behavior via callback.

ExecutionInfoFormatter formatter = ExecutionInfoFormatter.showAll();  // formatter ,converter
Logger logger = ...  // create own logger

DataSource proxyDs = new ProxyDataSourceBuilder(ds)
  .afterQuery(execInfo -> {
    String log = formatter.format(execInfo);  // convert ExecutionInfo to String
    logger.info(log);  // write out to own logger, sysout, etc
    // you can define more behavior
  })
  .build();

This was one of the my experience from datasource-proxy-r2dbc

Actually, this is already possible in 1.x branch(doc), just formatter(ExecutionInfo -> String) is not there.

ewhauser commented 6 years ago

It's generally annoying that JDBC has failed to expose any hooks for observability unlike database runtimes in other languages. I tried a number of datasource proxies when I wrote the p6spy integration but I can't really remember whether I tried datasource-proxy or not. In general, both seem like reasonable solutions.

It sounds like the main issues are:

1) The documentation should be improved (i.e. here's how you can configure this without a properties file) 2) An example of configuring this in Spring Boot 3) An example of configuring this with Spring XML

I can certainly address these three things if that's what is needed. Other suggestion would be to have people upon tickets when they have a feature request; I don't watch Gitter so didn't realize there was even any interest in improvements.

I would keep in mind that not everyone uses Spring (we certainly didn't when I wrote this) and some people will want tracing to be enabled on JDBC connections that they don't get to instantiate in their code. The documentation I wrote revolved around configuration via the JDBC string because that is what we needed to do.

All that being said, I'm in no way religious about the approach and if there is something better out there than great.

codefromthecrypt commented 6 years ago

@ewhauser thanks for piping in, especially highlighting the "no dependency injection" story as I presume that's what you meant by not spring (ex we do sometimes get feedback on dagger by folks or guice via DW). If you meant DI, but not spring, let us know so that we can create the right example. For example, there's a brave-webmvc-example which I plan to do JDBC examples in regardless.

On the other point, I think quite a large number of issues were raised in p6spy and many of them addressed. Well said, though.. some issues could be tracked better also here. I think I was probably too late coming to the realisation that people walking around things by using other tools was symptomatic of either us needing to change our approach, support another tool, or switch tools.

codefromthecrypt commented 6 years ago

One feedback I would give (ironically from a project that is wordplay on the word dapper :P) is that p6spy is a terrible name. No one knows it has anything to do with databases, and this causes support questions possibly leaving people to look elsewhere. We can't realistically do this experiment, but it would be interesting to know how many would have struggled if the library was named datasource-proxy :P

codefromthecrypt commented 6 years ago

to pre-empt the question, we use the same names of the libraries in our artifact names, so brave-instrumentation-datasource-proxy or brave-instrumentation-p6spy. We do this because the primary concern is instrumenting the library, that's also where all the classpath "bless" is around etc. I think we considered breaking this convention only for p6spy as it is so hard for people to know what it is.

ewhauser commented 6 years ago

Well, it was originally called brave-jdbc which was pretty descriptive but a certain someone made me change it =)

Logic-32 commented 6 years ago

brave-jdbc sounds like an awesome name to me simply because it's what we named our custom variant of this problem in-house.

Basically, my two cents on the problem as a whole:

  1. We rolled our own datasource-proxy-like implementation in house because we didn't realize it or p6spy existed at the time
  2. We avoid Spring like the plague but use Guice/HK2/Dagger for DI frameworks (depending on which team you're on)
  3. Instrumenting the DataSource directly, and returning a DataSource is great because they we can use Hikari/Hibernate/plain-ol-JDBC and get instrumentation no matter what

Briefly looking at datasource-proxy, it looks pretty cool. I don't have the time to dig in but our in-house requirements for it to be a viable replacement would be:

We will probably evaluate datasource-proxy in more detail at a (much) later date but until then, if you feel like it can meet those goals then I'm happy with it :)

felixbarny commented 6 years ago

//cc @typekpb @quintonm

codefromthecrypt commented 6 years ago

FYI I've heard several people want to name something brave-instrumentation-jdbc (not just originally @ewhauser which I admit to resisting). If we do formalize something with such a name we are essentially "picking a winner" in so far as it would almost certainly be incompatible to use whatever that code is with another driver. That's because like mentioned before JDBC has no hooks.

I'm highly resistent because whichever library it is, it will eventually have an incompatible change. Usually we can get around incompatible change by refering to the library version that is now changed. If we say brave-instrumentation-jdbc IOTW, yes, it does make things more visible, but OTOH, it locks us into an awkward different explanation for how to treat this differently from everything else. This will hurt eventually.

That all said, if folks knowing this want this to be the case, I am ok defending questions about it later. If so, upvote the comment with thumbsup or reply with alternate suggestion.

Meanwhile, what is that library to implement brave JDBC is still under debate. We've had a history of folks using p6spy, but not enough advocates for it speaking up yet. We should hear whoever they are, as currently people seem in favor of datasource-proxy, if anything, and that might just be due to who is replying.

devinsba commented 6 years ago

I'm a -1 on the name brave-instrumentation-jdbc, if there is going to be a module that is named that I think it should be an abstraction module like brave-instrumentation-http

I do not have skin in the game on the library choice front, but this is my opinion from a code organization point of view

typekpb commented 6 years ago

I won't comment on the library choice as I'm one of the p6spy maintainers. Anyone willing to raise separate issues on p6spy for the problems identified? Regardless on your choice we could implement what our users miss.

On Tue, Nov 13, 2018, 17:07 Brian Devins-Suresh <notifications@github.com wrote:

I'm a -1 on the name brave-instrumentation-jdbc, if there is going to be a module that is named that I think it should be an abstraction module like brave-instrumentation-http

I do not have skin in the game on the library choice front, but this is my opinion from a code organization point of view

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openzipkin/brave/issues/824#issuecomment-438322260, or mute the thread https://github.com/notifications/unsubscribe-auth/AAegbF2G0oYZrbPSE9uJDUsOoxMb-WeRks5uuu4zgaJpZM4YTyvp .

ewhauser commented 6 years ago

If the main problem to to solve on naming is discoverability, I'd probably opt for brave-instrumentation-jdbc-library. Although more verbose, it has the nice qualities of being discoverable and not picking a winner.

codefromthecrypt commented 6 years ago

If the main problem to to solve on naming is discoverability, I'd probably opt for brave-instrumentation-jdbc-library. Although more verbose, it has the nice qualities of being discoverable and not picking a winner.

I think the winner problem is that this would be implemented by something right? In such a way it would be p6spy or datasource-proxy unless we do what devinsba said which is to make an abstraction like we have for http (which has no dependency except jdbc and is implemented by one of the libraries)

Make sense?

codefromthecrypt commented 6 years ago

I have possibly a terrible idea to solve the discovery problem.

we make brave-instrumentation-jdbc we publish it to maven central. inside that jar is a copy of a README that says we have no abstraction yet. the current supported tracing things are p6spy and datasource-proxy

no winner choosing. no premature abstractions to haunt us. little maintenance. yes some maven central abuse

codefromthecrypt commented 6 years ago

PS we will eventually need brave-instrumentation-jdbc just like we currently need brave-instrumentation-servlet. For example, frameworks that layer on servlet apis, like spark and spring mvc need that jar. We will eventually need to create brave-instrumentation-jdbc as a jar even if only for support reasons.

That's why I don't think doing brave-instrumentation-jdbc with a README is long term debt, rather a placeholder until we need something which we will almost certainly need.

gavlyukovskiy commented 6 years ago

As a user of both I would say that p6spy has better api to create listeners and I'd chose p6spy if I need to create custom ones, but configuring it is not an easy part. Though now with JdbcEventListenerFactory you can programmatically register listeners in the same way as you do in datasource-proxy. I would like to see brave supporting both, maybe using common listener interface as I did, and give users a choice. Having brave-instrumentation-jdbc as umbrella for two implementations with explanation how to configure p6spy or datasource-proxy sounds right for me.

codefromthecrypt commented 5 years ago

lovely! https://github.com/ttddyy/datasource-proxy-r2dbc#distributed-tracing