markhoerth / dremio_client

Apache License 2.0
31 stars 26 forks source link

REST API to Dataframe #190

Closed dremio-brock closed 3 years ago

dremio-brock commented 3 years ago

Description

When using the REST API, the dataframe returned is a list of json objects instead of the expected dataframe.

What I Did

from dremio_client import init
import pandas as pd

query = 'select * from information_schema."tables"'
client = init(simple_client=True)
results = client.query(query, asynchronous=False)
df = pd.DataFrame(results)

Results

 rowCount  ...                                               rows
0       636  ...  [{'TABLE_CATALOG': 'DREMIO', 'TABLE_SCHEMA': '...
1       636  ...  [{'TABLE_CATALOG': 'DREMIO', 'TABLE_SCHEMA': '...
2       636  ...  [{'TABLE_CATALOG': 'DREMIO', 'TABLE_SCHEMA': '...
3       636  ...  [{'TABLE_CATALOG': 'DREMIO', 'TABLE_SCHEMA': '...
4       636  ...  [{'TABLE_CATALOG': 'DREMIO', 'TABLE_SCHEMA': '...
5       636  ...  [{'TABLE_CATALOG': 'DREMIO', 'TABLE_SCHEMA': '...
6       636  ...  [{'TABLE_CATALOG': 'DREMIO', 'TABLE_SCHEMA': '...

Fix

Adding the following code would convert the results from the REST API to a proper dataframe.

new_df = pd.concat([pd.DataFrame(pd.json_normalize(x)) for x in df['rows']], ignore_index=True)

Proper results

TABLE_CATALOG  ... TABLE_TYPE
0          DREMIO  ...      TABLE
1          DREMIO  ...      TABLE
2          DREMIO  ...      TABLE
3          DREMIO  ...      TABLE
4          DREMIO  ...       VIEW
..            ...  ...        ...
631        DREMIO  ...      TABLE
632        DREMIO  ...      TABLE
633        DREMIO  ...      TABLE
634        DREMIO  ...      TABLE
635        DREMIO  ...      TABLE
rymurr commented 3 years ago

in query.py one could likely do pd.DataFrame(pd.json_normalize(x) for x in results). I haven't had time to check it yet though so will leave this open till I get around to fixing it.