No way to adjust priority of pooled data source creators

DarioAmiri commented 9 years ago

Please see lines 36-63 of https://github.com/spring-cloud/spring-cloud-connectors/blob/master/spring-cloud-spring-service-connector/src/main/java/org/springframework/cloud/service/relational/DataSourceCreator.java

The logic in this code is problematic because it seems to pick the first pooled data source creator it finds on the classpath. Therefore the code makes it impossible for me to prefer TomcatHighPerformancePooledDataSourceCreator over BasicDbcpPooledDataSourceCreator.

Also there seems no way for me to set pooledDataSourceCreators since it is a private variable with no getter or setter. So it seems there is no way for me to ensure that my app uses TomcatHighPerformancePooledDataSourceCreator. Am I missing something?

It would be great if I could somehow explicitly configure what pooled data source creator I want to use or at least set a preference.

scottfrederick commented 9 years ago

@amiri-ge You are correct, that logic currently assumes that you only have one of those pooling implementations on the classpath. This seems to work for most people, as you have to explicitly add these libraries as a dependency in your application for them to be on the classpath. Can you elaborate on the use case where you have a pooling library on the classpath that you don't want Connectors to detect and use?

DarioAmiri commented 9 years ago

@scottfrederick CloudFoundry UAA is an example of a project that has more than one pooling library. It supports multiple databases and has a dependency on MariaDB which is pulling in commons-dbcp. It also is pulling in tomcat-jdbc. Thus, I am in a bad spot because I don't control the dependent libraries and so I don't control the pool that I use.

What advise would you give me to fix this problem?

fhanik commented 9 years ago

MariaDB is not using commons-dbcp in the UAA. the tomcat-jdbc pool holds a reference to a org.mariadb.jdbc.MySQLConnection and that holds a reference to MySQLProtocol which holds a reference to the socket

DarioAmiri commented 9 years ago

@fhanik That's not what I meant what I meant is that the if you execute:

gradle dependencies

it shows the commons-dbcp library coming from the maria db dependency. Since spring cloud relies on the presence of classes in the classpath to decide what connection pool to use, it ends up using the commons-dbcp connection pool.

ramnivas commented 9 years ago

I remember thinking about this problem and while I didn't implement it at that time, I was thinking along the lines of testing for an environment variable, say, DB_CONNECTION_POOL_CLASS. If such a variable exists, then spring-cloud-connectors short circuits the pool selection logic and simply uses the class specified by the environment variable.

DarioAmiri commented 9 years ago

@ramnivas That would be very helpful indeed.

DarioAmiri commented 9 years ago

Is there sufficient understanding of this issue to warrant a new feature? I believe that being able to explicitly set the DBCP implementation is a must have for production environments.

Setting it through an environment variable would work but even setting it through bean properties or spring cloud configuration would be fine too since it's a setting that would not change often.

Alternatively, simply rearranging the priority for what DBCP to use would be helpful. It seems to me that you would want to prefer the implementations with high performance capabilities, which, unfortunately, is currently not the case.

scottfrederick commented 9 years ago

Is there sufficient understanding of this issue to warrant a new feature? I believe that being able to explicitly set the DBCP implementation is a must have for production environments.

Yes, I think we should provide this as a feature.

setting it through bean properties or spring cloud configuration would be fine too since it's a setting that would not change often.

I like the idea of setting this as a configuration option via the XML or Java configuration. The DataSourceConfig class would be a natural place to put a field to specify which pooling implementation should be used, and this bean is already accessible to the code that selects the pooling implementation.

Alternatively, simply rearranging the priority for what DBCP to use would be helpful. It seems to me that you would want to prefer the implementations with high performance capabilities, which, unfortunately, is currently not the case.

The question of which pooling implementations have better performance, as well as other capabilities and features that may be preferred, will always be subjective. It would be better to let developers pick the implementation if they know that multiple could be on the classpath.

scottfrederick commented 9 years ago

Related to #6.

scottfrederick commented 9 years ago

Fixed via https://github.com/spring-cloud/spring-cloud-connectors/commit/eaa65c2bf190900b57e7168c07ed0ea26615fc2f.

With this commit, you can specify a list of DataSourceCreator implementations to consider for detection when creating a DataSource. You can specify DataSourceCreators by full class name, or by strings contained in the class name. Either of these examples would modify the order of detection in the same way:

@Bean
public DataSource dataSource() {
    List<String> dataSourceNames = Arrays.asList("TomcatJdbcPooledDataSourceCreator", "HikariCpPooledDataSourceCreator", "BasicDbcpPooledDataSourceCreator");
    DataSourceConfig dbConfig = new DataSourceConfig(dataSourceNames);
    return connectionFactory().dataSource("mydb", dbConfig);
}

@Bean
public DataSource dataSource() {
    List<String> dataSourceNames = Arrays.asList("TomcatJdbc", "HikariCp", "BasicDbcp");
    DataSourceConfig dbConfig = new DataSourceConfig(dataSourceNames);
    return connectionFactory().dataSource("mydb", dbConfig);
}

XML configuration is also supported:

    <cloud:data-source id="customer-db" service-name="mydb">
        <cloud:pool-data-sources>
            <value>TomcatJdbc</value>
            <value>HikariCp</value>
            <value>BasicDbcp</value>
        </cloud:pool-data-sources>
    </cloud:data-source>

scottfrederick commented 9 years ago

@amiri-ge Regarding https://github.com/spring-cloud/spring-cloud-connectors/issues/6#issuecomment-128514663 (and getting all this discussion back on one thread):

I feel that if Spring Cloud is going to hijack the construction of established data sources, then it's configuration should come from the environment rather than from the application implementation or configuration. Thoughts?

Spring Cloud Connectors does not hijack the construction of data sources. The Cloud Foundry Java buildpack will conditionally pull in an auto-reconfiguration library that will reconfigure database connections under certain conditions. Auto-reconfiguration is intended for simple use cases, where only a single database service is bound to an application running on CF and default connection properties are acceptable. If defaults are not acceptable, then an application should use Spring Cloud Connectors explicitly to configure the connection.

All Spring Cloud Connectors connection configuration is done via Java or XML configuration. I wouldn't want to make just this one option (order of pooling implementations) configurable via environment variable without taking a more holistic look at making all options configurable that way.

scottfrederick commented 9 years ago

@amiri-ge Can you provide any feedback on this solution before I close this issue and we prepare a 1.2.1 release?

DarioAmiri commented 9 years ago

@scottfrederick Hi Scott, I appreciate your patience on this matter and I am still a bit confused by it. If I am interpreting what you are saying correctly, the auto-reconfiguration library, which is brought in by the CF java buildpack is reconfiguring my data source. However, in order to disable the auto-reconfiguration library I have to declare an spring cloud bean that creates a data source according to my desired configuration.

I suppose that is an acceptable solution in some ways; unfortunately, the stewards of the Cloud Foundry UAA project, are rejecting any solution that involves adding Spring Cloud as a dependency to their project and all the solutions you have suggested require either the Spring Cloud namespace or Spring Cloud classes. Am I missing something?

So essentially, I'm stuck having to maintain a custom patch for the UAA I deploy to CF. I wonder if there is anyway that your team and the UAA team could get together and work out a solution that makes everyone happy, or at least a solution that will leave everyone equally unhappy but that I can contribute back to the UAA source.

scottfrederick commented 9 years ago

@amiri-ge You've summarized the situation well. Any further changes for your specific use case will need to be made in UAA and/or buildpack auto-reconfig. We can continue that on the UAA issue thread, and I'll close this one.

fhanik commented 9 years ago

Seems like an odd resolution. Since this library modifies a perfectly correct Spring XML config it would be up to this library to ensure that it continues to function as the developer intended.

If this requires the UAA, and other applications, to pull in the Spring Cloud library into its dependencies for the data source to be correctly configured, what happens when the version of Spring Cloud in the UAA gets out of sync with the one in the Java build pack. This is the main reason we rejected adding Spring Cloud as a dependency in the UAA, as the UAA in no way actual depends on this library, and it will be impossible to stay in sync.

@amiri-ge - What we can do in the UAA to mitigate this behavior is to go with the original idea. Specify the "cloud" profile in such a way that the system never looks for the library unless that profile is active. It will take some effort to create a test case around this, but it is doable.

scottfrederick commented 9 years ago

Since this library modifies a perfectly correct Spring XML config it would be up to this library to ensure that it continues to function as the developer intended.

The library that modifies the Spring config is the Java buildpack auto-reconfiguration support, not Spring Cloud Connectors. Auto-reconfig happens to use Connectors, but Connectors doesn't do any reconfig on its own. Using Connectors directly in an app is one way of disabling the auto-reconfig behavior (but there are other ways).

If this requires the UAA, and other applications, to pull in the Spring Cloud library into its dependencies for the data source to be correctly configured

I don't think it is a matter of correctness, but of preferences (connection pooling impl, pool sizes, etc). If someone is pushing UAA to CF on their own, they need a way for that UAA to connect to its UAADB. Their options are:

Modify the Spring XML config for the database connection directly and disable Java buildpack auto-reconfig.
Rely on Java buildpack auto-reconfig to swap the Spring config out to use a bound database service instead, and accept the defaults provided by that connection.
Change the UAA code to pull the credentials for a bound database service and create a custom connection, using Spring Cloud Connectors or any other method.

Connectors just makes option 3 easier to do.

fhanik commented 9 years ago

@scottfrederick Thanks for the clarification. It makes sense now. Two different libraries, one pulling in the other. I do have some questions though

1) Modify the Spring XML config for the database connection directly and disable Java buildpack auto-reconfig.

How would they disable auto-reconfig here? When deploying the UAA as a Java app, you can send up a Yaml config file along with it or set environment properties. This does everything they need.

For option 3, isn't that what we already do, except auto reconfig garbles it, and thus, we have this issue :)

@amiri-ge looks like we have a few options

Figure out how to disable auto reconfig - and then the system should work out of the box OR
Figure out how to play nice with auto reconfig align with it

DarioAmiri commented 9 years ago

@fhanik auto-reconfiguration is a cool feature and we would like to use it, so long as we can control what DBCP we use.

However, it does seem odd to me that the buildpack includes auto-reconfig and the only way to disable it is to modify UAA. My gut tells me that at the very least there should be an environment variable for disabling auto-reconfig.

If the latter is not an option then the only way I see out of this is to add a specialized profile to UAA. Perhaps we can use the "cloud" profile or maybe even a new "cloud-foundry" profile. Either way, I feel like we've exhausted the usefulness of having this conversation over git. Is there a way we can setup a CC to discuss this and come to some kind of understanding?

scottfrederick commented 9 years ago

it does seem odd to me that the buildpack includes auto-reconfig and the only way to disable it is to modify UAA. My gut tells me that at the very least there should be an environment variable for disabling auto-reconfig.

With Java buildpack 3.0 and greater, you can disable auto-reconfig by setting an environment variable on the application:

cf set-env app-name JBP_CONFIG_SPRING_AUTO_RECONFIGURATION 'enabled: false'

see:

auto-reconfiguration is a cool feature and we would like to use it, so long as we can control what DBCP we use.

Auto-reconfiguration is all-or-nothing. Either you let auto-reconfig run and accept all of its defaults, or you create the connection yourself (i.e. using options 1 or 3 above).

DarioAmiri commented 9 years ago

@scottfrederick If that is the case, then as a user, I would expect an all-or-nothing feature to: 1) Use the best available DBCP not the worst one. 2) Set usable defaults for the connection pool. Perhaps based on some resource detection logic. I'm not sure who decided to set max-active as 4 by default. In what production environment would that make sense?

Not trying to beat up on anyone. Just stating the facts.

scottfrederick commented 9 years ago

If that is the case, then as a user, I would expect an all-or-nothing feature to: 1) Use the best available DBCP not the worst one.

Point taken. That list has grown slowly over time from just commons basic-dbcp and tomcat-dbcp to include the more high-perf options. We could just re-order those defaults, but we'd have to think about who might get surprised by a different default being chosen.

2) Set usable defaults for the connection pool. Perhaps based on some resource detection logic. I'm not sure who decided to set max-active as 4 by default. In what production environment would that make sense?

Auto-reconfig wasn't really intended to be used in production environments. It is a convenience to make the first-time "look, you don't even need to change your code" cf push experience as smooth as possible, and to retain compatibility with CF v1. It has constraints that limit its usefulness in many production scenarios.

The connection pool defaults were specifically chosen for compatibility with free tiers of database services on public platforms like Pivotal Web Services, IBM BlueMix, and Heroku. These free service tiers usually have very low connection limits, so defaulting to anything higher is likely to break that first-push auto-reconfig experience.

Not trying to beat up on anyone. Just stating the facts.

No worries, healthy debate is always welcomed.

fhanik commented 9 years ago

So to summarize

Auto reconfig is not meant to be used in production.
There is a way to disable it

Do we still have an issue at this point? Do we still need to add a cloud profile in the UAA?

DarioAmiri commented 9 years ago

@fhanik I can't be certain, but it does look like we may not need it. I'm going to chat with some folks on my end and experiment with the environment variable for disabling auto-reconfig. Hopefully, that's a solution that will work for us.

@scottfrederick Thank you for all the information you have provided. One small suggestion I would make, to whatever party is responsible, is to better document the availability of the "JBP_CONFIG_SPRING_AUTO_RECONFIGURATION" environment variable. Perhaps it was in the documentation already and I missed it, but in all the blogs we were told to read and all the nice people who tried to help us out, this is the first time I can recall someone mentioning such a simple solution to the problem. I did a google search on this environment variable and it only pulled one result.

Perhaps I'm assuming too much, but I would say that the env var should be the front and center way to disable auto-reconfig not the current ways that are suggested in the blogosphere.

scottfrederick commented 9 years ago

@amiri-ge I'll pass that feedback along to the Java buildpack and docs teams.

fhanik commented 9 years ago

@scottfrederick Thanks for all your help, and for clearing up the confusion and help us understand where the auto reconfig originates

spring-cloud / spring-cloud-connectors

No way to adjust priority of pooled data source creators #131