markhoerth / dremio_client

Apache License 2.0
31 stars 25 forks source link

ODBC Issues, Downgrading from Flight --> ODBC --> REST #66

Open be-rock opened 4 years ago

be-rock commented 4 years ago

Description

Hi. I'm trying to connect over ODBC with dremio_client. Always see the following when trying to execute a query.:

WARNING:root:Unable to run query as flight, downgrading to odbc
WARNING:root:Unable to run query as odbc, downgrading to rest

The flight downgrade is not surprising and acceptable for now but the ODBC downgrade is a bit surprising. I think that is tied to this: https://github.com/rymurr/dremio_client/blob/f892563f64ed07c00b4a1314172d3ea9d324d41c/dremio_client/odbc.py#L31

What I Did

When trying to connect over ODBC via pyodbc driver name: Dremio ODBC Driver i see the following. Note, this is external to dremio_client for the sake of this test...

pyodbc.Error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'Dremio ODBC Driver' : file not found (0) (SQLDriverConnect)")

However, changing the driver name to the absolute path of the dylib resolves the issue such as: /Library/Dremio/ODBC/lib/libdrillodbc_sbu.dylib. I believe this is why dremio_client is downgrading from odbc --> rest. Is there a way to override the default driver constant defined as _OSX_DRIVER here: https://github.com/rymurr/dremio_client/blob/f892563f64ed07c00b4a1314172d3ea9d324d41c/dremio_client/odbc.py#L31

rymurr commented 4 years ago

Hey @be-rock, the --version option will be added in the next release. Thanks for the suggestion.

regarding the odbc preoblem I haven't tested on OSX before (obviously). But it looks to me like pyodbc isn't recognising your odbcinst.ini. Have you installed pyodbc and the odbc driver in the same environment? I have seen this before on linux if the odbc driver is installed via the rpm installer and pyodbc is in a conda environment.

So this indicates to me that the likely issue is that pyodbc isn't configured as dremio_client would expect. We can try a few things: a) debug the pyodbc problem. This might help: https://github.com/mkleehammer/pyodbc/wiki/Connecting-to-SQL-Server-from-Mac-OSX b) monkey patch _DRIVER for your local environment, are you using the command line client or direct from python.

be-rock commented 4 years ago

Hi @rymurr thanks for the feedback. I downloaded the driver (dmg) directly from https://www.dremio.com/drivers/ and then did indeed install pyodbc in a conda environment.

I am using directly from python.

Would it be easier to allow for override of Driver via ~/.config/dremio_client/config.yaml?

be-rock commented 4 years ago

I did try to monkey patch odbc._DRIVER but still saw the warning message that it was downgrading from flight -> odbc -> rest

[ins] In [1]: from dremio_client import init, odbc
         ...: odbc._DRIVER = "/Library/Dremio/ODBC/lib/libdrillodbc_sbu.dylib"
         ...: client = init()
         ...: results = client.query("SELECT * FROM sys.reflections")
         ...:
WARNING:root:Unable to run query as flight, downgrading to odbc
WARNING:root:Unable to run query as odbc, downgrading to rest
rymurr commented 4 years ago

Hey @be-rock

I will try to find some time to add the driver as a config item. Will not be soon as I am rather busy atm.

In the meantime: 1) ensure you have pandas installed 2) what is the output of python -m "import pyodbc;print(pyodbc.drivers())" 3) if its empty can you run the following:

odbcinst -i -d -n "Dremio ODBC Driver" -f <dremio_odbc_install_dir>/odbcinst.ini
odbcinst -i -l -s -n "Dremio ODBC Driver" -f <dremio_odbc_install_dir>/odbc.ini

4) try 2) again

odbcinst should have been installed when you installed pyodbc and its how conda informs itself of available drivers.

be-rock commented 4 years ago

Thanks @rymurr

I think you mean python -c. If so, this is the result after executing the above odbcinst commands. Note that Dremio ODBC Driver was not in the list prior to the odbcinst.

$ python -c "import pyodbc;print(pyodbc.drivers())"
['ODBC Driver 17 for SQL Server', 'Dremio ODBC Driver', 'ODBC Drivers', 'Cloudera ODBC Driver for Apache Hive', 'Cloudera ODBC Driver for Impala']
$ python                                                                                                                                                                                                                         (py38)
Python 3.8.2 | packaged by conda-forge | (default, Mar  5 2020, 16:54:44)
[Clang 9.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from dremio_client import init, odbc
>>> client = init()
>>> results = client.query("SELECT * FROM sys.reflections")
WARNING:root:Unable to run query as flight, downgrading to odbc
WARNING:root:Unable to run query as odbc, downgrading to rest

However I am still seeing the flight -> odbc -> rest downgrade.

rymurr commented 4 years ago

Thanks for that @be-rock , and you are right -m was the right flag. Apologies!

Do you have pandas installed? If the driver is available the only other place odbc could fail like that is if it doesn't have pandas available to create a DataFrame.

be-rock commented 4 years ago

Yes, pandas is in that environment.

$ python -c "import pyodbc, pandas;print(pyodbc.drivers(), pandas.__version__)"
['ODBC Driver 17 for SQL Server', 'Dremio ODBC Driver', 'ODBC Drivers', 'Cloudera ODBC Driver for Apache Hive', 'Cloudera ODBC Driver for Impala'] 1.0.3
rymurr commented 4 years ago

Hey @be-rock very strange...

can you try the following in python:

import pyodbc
import pandas
client = pyodbc.connect(
            "Driver={};ConnectionType=Direct;HOST={};PORT={};AuthenticationType=Plain;UID={};PWD={}".format(
                "Dremio ODBC Driver", hostname, port, username, password
            ),
            autocommit=True,
        )
pandas.read_sql(sql, client)

This is the code that dremio_client would execute and if you have all the dependencies and pyodbc can see your driver than everything should work fine. So hopefully this will show us where the issue lies. I will also try and find an osx machine to reproduce on. It is possible the answer here may help too: https://stackoverflow.com/questions/56552377/pyodbc-works-in-non-conda-python-but-fails-to-load-drivers-in-conda it seems like this is a somewhat common problem for odbc/conda/osx

be-rock commented 4 years ago

Thank Ryan.

Actually, I spotted a config issue in my ~/.config/dremio_client/config.yaml file that had an incorrect port (i changed the port to 31010). So i can connect fine as follows:

         ...: import pyodbc
         ...: import pandas
         ...: client = pyodbc.connect(
         ...:             "Driver={};ConnectionType=Direct;HOST={};PORT={};AuthenticationType=Plain;UID={};PWD={}".format(
         ...:                 "Dremio ODBC Driver", hostname, port, username, password
         ...:             ),
         ...:             autocommit=True,
         ...:         )
         ...: pandas.read_sql("select count(*) from sys.reflections", client)
         ...:
         ...:
Out[5]:
   EXPR$0
0       5

But now seeing an issue when:

from dremio_client import init
client = init()
ProtocolError: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

ConnectionError                           Traceback (most recent call last)
<ipython-input-7-d373cb84b4d5> in <module>
      1 from dremio_client import init
----> 2 client = init()

~/miniconda3/envs/py38/lib/python3.8/site-packages/dremio_client/__init__.py in init(config_dir, simple_client, config_dict)
     65         config_dict = dict()
     66     config = get_config(config_dir, args=config_dict)
---> 67     return _connect(config, simple_client)
     68 
     69 

~/miniconda3/envs/py38/lib/python3.8/site-packages/dremio_client/__init__.py in _connect(config, simple)
     71     if simple:
     72         return SimpleClient(config)
---> 73     return DremioClient(config)
     74 
     75 

~/miniconda3/envs/py38/lib/python3.8/site-packages/dremio_client/dremio_client.py in __init__(self, config)
     68         self._username = config["auth"]["username"].get()
     69         self._password = config["auth"]["password"].get()
---> 70         self._token = auth(self._base_url, config)
     71         self._ssl_verify = config["verify"].get(bool)
     72         self._catalog = catalog(self._token, self._base_url, self.query, self._ssl_verify)

~/miniconda3/envs/py38/lib/python3.8/site-packages/dremio_client/auth/__init__.py in auth(base_url, config_dict)
     42     except KeyError:
     43         if auth_type == "basic":
---> 44             token = config_auth(base_url, config_dict)
     45             _write_token(token, config_dict)
     46             return token

~/miniconda3/envs/py38/lib/python3.8/site-packages/dremio_client/auth/config.py in login(base_url, config_dict)
     50     except NotFoundError:
     51         verify = 10
---> 52     return _login(base_url, username, password, timeout, verify)

~/miniconda3/envs/py38/lib/python3.8/site-packages/dremio_client/auth/basic.py in login(base_url, username, password, timeout, verify)
     38     url = base_url + "/apiv2/login"
     39 
---> 40     r = requests.post(url, json={"userName": username, "password": password}, timeout=timeout, verify=verify)
     41     r.raise_for_status()
     42     return r.json()["token"]

~/miniconda3/envs/py38/lib/python3.8/site-packages/requests/api.py in post(url, data, json, **kwargs)
    114     """
    115 
--> 116     return request('post', url, data=data, json=json, **kwargs)
    117 
    118 
rymurr commented 4 years ago

Hey @be-rock

That is a new one! Some searching online seems to suggest that is an OSX specific error. Can you ensure you have the correct libraries on your conda env: conda install pyopenssl. There are some hints online that this is an ssl related problem. Do you have ssl enabled?

Also glad you got odbc working!

be-rock commented 4 years ago

Hi Ryan,

ssl is not currently enabled.

$ grep ssl ~/.config/dremio_client/config.yaml
ssl: false

I do have pyopenssl installed but not using it currently as noted above.

I'll try to dig into this a bit more on my side.

Thanks again.

be-rock commented 4 years ago

Hi @rymurr

I dug a little further on this but just cannot seem to successfully connect to odbc via dremio_client. ODBC connectivity through to Dremio directly through pyodbc continues to work fine outside of the client so i'm led to believe its not an ODBC setup issue.

this is the current yaml config

auth:
    type: basic #  currently only basic is supported
    username: myusername
    password: mypswd
    timeout: 10
hostname: myhostname
port: 7183
ssl: false
rymurr commented 4 years ago

Hey @be-rock apologies for the delay. I will try and set up an OSX VM to test it out. I am out of ideas.

3runkenzie commented 4 years ago

@rymurr @be-rock I debugged the code and it looks like with the current way the dremio-client is set up (when trying to use ODBC method [port: 31010]), it's trying to execute basic.login() with http://**HOST**:31010/apiv2/login and that is why we are receiving that exception. Obviously with ODBC we don't need/want to make a REST POST /apiv2/login call. I can probably make a PR to fix this.

rymurr commented 4 years ago

@3runkenzie nice one! Somehow I totally missed that. A PR would be most welcome!

be-rock commented 4 years ago

96