r-dbi / odbc

Connect to ODBC databases (using the DBI interface)
https://odbc.r-dbi.org/
Other
392 stars 107 forks source link

snowflake: runtime driver config checks #857

Open detule opened 3 weeks ago

detule commented 3 weeks ago

Hi @simonpcouch:

Following up on a conversation we had a few weeks ago where I noticed that the simba snowflake drivers on macOS suffer from similar issues as the ones you ran into with databricks. In particular, and this is one that bit me as well, the simba configuration file (.ini) needs to specify DriverManagerEncoding=UTF-16. The OEM SNOWFLAKE driver, for example, looks to be configured for iODBC, and has a different value that doesn't work for us.

In this PR I expanded some of the checks you had written for databricks to snowflake as well, with the following changes:

The last bullet in particular, I think is why the diff ended up being (much) larger than I originally would have anticipated.

Rather than duplicating methods, tried to make some of the methods "simba" generic ( tried to keep those in utils.R ), while others that were specific one of the two back ends I moved to driver-*.R. We are close to perhaps needing to make some of those specific methods S3 or S4, but at this point IMO, that seems like a bit of an overkill.

Cheers.

TODO:

hadley commented 3 weeks ago

@atheriel can you take a look at this too please?

detule commented 1 week ago

A first pass through! I'm really happy we're aiming to resolve this issue for Snowflake users, too.

I said this in PM as well but wanted to reiterate once more here in case other folks support (feel free to ignore and I won't bug you any longer😜): I think the same issue should lead to the same outcome across drivers, and I'd argue that we ought to just edit the file as we do with Databricks. Especially given that this PR introduces odbc.no_config_override (on board!), I'm fine with the more invasive approach given that it allows for this connection to "just work" for users. No longer needing to manage the different behavior for difference DBMS', I think, would greatly simplify this PR as well.

Thanks Simon - appreciate you taking a look. Apologies for misinterpreting your earlier note on whether to modify in place or just warn. I changed snowflake so that the behavior is the same as databricks ( modify ).

I did leave the warn code in the method, in case we want to pursue that option in the future. However, happy to remove if you have concerns.

detule commented 1 week ago

I think efforts to make drivers "just work" are worthwhile for sure, so in that sense I'm supportive. However, it's a little unclear to me from the code whether this only applies to macOS users or if it's designed to help Linux users, too.

If the former, I think the code should be more explicit about that. If the latter, I think the Linux paths are missing from spark_simba_config() and snowflake_simba_config().

Hey @atheriel - appreciate the feedback.

It's the former - at least for now. Most of these issues are coming from the fact that vendors seem to be distributing MacOS drivers that, out-of-the-box, are configured to work when paired with iODBC, rather than unixODBC. Our package, can't switch between unixODBC and iODBC at run-time; rather, that's determined at build-time ( and is configured for unixODBC). I haven't seen these issues with the OEM drivers for Linux.

At some point, wouldn't mind doing some investigative work to determine what it would take to be able to adjust our posture relative to the driver managers more dynamically.

if (!is_macos()) {
  return(invisible())
}

Let me know if you had something else in mind. Thanks again