SNOW-173284: 'fetch_pandas_all' results in error 'pyarrow package is missing' #336

Closed leifericf closed 4 years ago

leifericf commented 4 years ago

Python version: 3.7.6

Operating system and processor architecture: Darwin-19.4.0-x86_64-i386-64bit

Component versions in the environment:

To reproduce the error, call the fetch_pandas_all() function, like so:

def execute_query(query_string):
    cursor = snowflake.connector.connect(…).cursor()
        result = cursor.fetch_pandas_all()
    return result

query_string = 'select a.col1, a.col2 from my_database.my_schema.my_table as t limit 100;'

data = execute_query(query_string)

That will result in this pyarrow-related error:

Exception has occurred: ProgrammingError
255002: pyarrow package is missing. Install using pip if the platform is supported.

I would expect the snowflake-connector-python package to install its own dependencies as needed.

Note that using the fetchall() function works fine:

def execute_query(query_string):
    cursor = get_connection().cursor()
        result = cursor.fetchall()
    return result

The issue seems to be related to converting the SQL query result to a Pandas dataframe.

Detailed execution log for debugging:

2020-07-07 11:48:16,365 - MainThread - cursor() - DEBUG - cursor
DEBUG - cursor
2020-07-07 11:48:16,365 - MainThread - execute() - DEBUG - executing SQL/command
DEBUG - executing SQL/command
2020-07-07 11:48:16,365 - MainThread - execute() - DEBUG - binding: [select a.col1, a.col2 from my_database.my_schema.my_table as t limit 1...] with input=[None], processed=[{}]
DEBUG - binding: [select a.col1, a.col2 from my_database.my_schema.my_table as t limit 1...] with input=[None], processed=[{}]
2020-07-07 11:48:16,365 - MainThread - execute() - INFO - query: [select a.col1, a.col2 from my_database.my_schema.my_table as t limit 1...]
INFO - query: [select a.col1, a.col2 from my_database.my_schema.my_table as t limit 1...]
2020-07-07 11:48:16,365 - MainThread - _next_sequence_counter() - DEBUG - sequence counter: 1
DEBUG - sequence counter: 1
2020-07-07 11:48:16,366 - MainThread - _execute_helper() - DEBUG - running query [select a.col1, a.col2 from my_database.my_schema.my_table as t limit 1...]
DEBUG - running query [select a.col1, a.col2 from my_database.my_schema.my_table as t limit 1...]
2020-07-07 11:48:16,378 - MainThread - _execute_helper() - DEBUG - is_file_transfer: False
DEBUG - is_file_transfer: False
2020-07-07 11:48:16,379 - MainThread - cmd_query() - DEBUG - _cmd_query
DEBUG - _cmd_query
2020-07-07 11:48:16,379 - MainThread - cmd_query() - DEBUG - sql=[select a.col1, a.col2 from my_database.my_schema.my_table as t limit 1...], sequence_id=[1], is_file_transfer=[None]
DEBUG - sql=[select a.col1, a.col2 from my_database.my_schema.my_table as t limit 1...], sequence_id=[1], is_file_transfer=[None]
2020-07-07 11:48:16,379 - MainThread - _use_requests_session() - DEBUG - Active requests sessions: 1, idle: 0
DEBUG - Active requests sessions: 1, idle: 0
2020-07-07 11:48:16,379 - MainThread - _request_exec_wrapper() - DEBUG - remaining request timeout: None, retry cnt: 1
DEBUG - remaining request timeout: None, retry cnt: 1
2020-07-07 11:48:16,379 - MainThread - _request_exec() - DEBUG - socket timeout: 60
DEBUG - socket timeout: 60
2020-07-07 11:48:16,531 - MainThread - _request_exec() - DEBUG - SUCCESS
2020-07-07 11:48:16,531 - MainThread - _use_requests_session() - DEBUG - Active requests sessions: 0, idle: 1
DEBUG - Active requests sessions: 0, idle: 1
2020-07-07 11:48:16,531 - MainThread - _post_request() - DEBUG - ret[code] = None, after post request
DEBUG - ret[code] = None, after post request
2020-07-07 11:48:16,532 - MainThread - execute() - DEBUG - sfqid: my_sfqid
DEBUG - sfqid: my_sfqid
2020-07-07 11:48:16,532 - MainThread - execute() - INFO - query execution done
INFO - query execution done
2020-07-07 11:48:16,532 - MainThread - execute() - DEBUG - SUCCESS
2020-07-07 11:48:16,532 - MainThread - execute() - DEBUG - PUT OR GET: None
2020-07-07 11:48:16,532 - MainThread - _init_result_and_meta() - DEBUG - Query result format: arrow
DEBUG - Query result format: arrow
2020-07-07 11:48:16,533 - MainThread - _init_result_and_meta() - DEBUG - Batches read: 1
DEBUG - Batches read: 1
sfc-gh-mkeller commented 4 years ago

Hi @IRLeif the connector knows what dependencies to install if you tell it that you will be needing pandas (and pyarrow). We have an optional dependency group called pandas. Install the connector like this: pip install snowflake-connector-python[pandas], documentation is here:

leifericf commented 4 years ago

Hi @IRLeif the connector knows what dependencies to install if you tell it that you will be needing pandas (and pyarrow). We have an optional dependency group called pandas. Install the connector like this: pip install snowflake-connector-python[pandas], documentation is here:

Aha! It was my mistake. I was unfamiliar with the concept of optional dependency groups in general, and I had missed that part of the Snowflake documentation in particular. After adding [pandas] to my pip install command, everything is now working smoothly. Thank you for taking the time to comment.