heavyai / pymapd

Python client for OmniSci GPU-accelerated SQL engine and analytics platform
https://pymapd.readthedocs.io/en/latest/
Apache License 2.0
111 stars 50 forks source link

Errors with int8 #169

Closed sglyon closed 5 years ago

sglyon commented 5 years ago

When trying to call load_table with a DataFrame containing int8 columns two errors happen:

  1. The routine creates a table with the correct column types (int8), but the upload fails (see below)
  2. The int8 column is converted to int16 after the CREATE TABLE statement is issued, but before the data is uploaded

Note that this also mutates the dataframe itself inplace...

In [1]: import pymapd

In [2]: oconn = pymapd.connect("XXXX")

In [3]: import pandas as pd

In [4]: df = pd.DataFrame(dict(x=[1, 2, 3], y=[4, 5, 6]))

In [5]: df["x"] = df["x"].astype("int8")

In [6]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
x    3 non-null int8
y    3 non-null int64
dtypes: int64(1), int8(1)
memory usage: 107.0 bytes

In [7]: oconn.load_table("temp_table", df)
---------------------------------------------------------------------------
TMapDException                            Traceback (most recent call last)
<ipython-input-7-4968649d5580> in <module>
----> 1 oconn.load_table("temp_table", df)

~/anaconda3/lib/python3.6/site-packages/pymapd/connection.py in load_table(self, table_name, data, method, preserve_index, create)
    476         if method == 'infer':
    477             if (isinstance(data, pd.DataFrame) or _is_arrow(data)):
--> 478                 return self.load_table_arrow(table_name, data)
    479
    480             elif (isinstance(data, pd.DataFrame)):

~/anaconda3/lib/python3.6/site-packages/pymapd/connection.py in load_table_arrow(self, table_name, data, preserve_index)
    621                                            preserve_index=preserve_index)
    622         self._client.load_table_binary_arrow(self._session, table_name,
--> 623                                              payload.to_pybytes())
    624
    625     def render_vega(self, vega, compression_level=1):

~/anaconda3/lib/python3.6/site-packages/mapd/MapD.py in load_table_binary_arrow(self, session, table_name, arrow_stream)
   2438         """
   2439         self.send_load_table_binary_arrow(session, table_name, arrow_stream)
-> 2440         self.recv_load_table_binary_arrow()
   2441
   2442     def send_load_table_binary_arrow(self, session, table_name, arrow_stream):

~/anaconda3/lib/python3.6/site-packages/mapd/MapD.py in recv_load_table_binary_arrow(self)
   2462         iprot.readMessageEnd()
   2463         if result.e is not None:
-> 2464             raise result.e
   2465         return
   2466

TMapDException: TMapDException(error_msg='Exception: Expected int8 type')

In [8]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
x    3 non-null int16
y    3 non-null int64
dtypes: int16(1), int64(1)
memory usage: 110.0 bytes
sglyon commented 5 years ago

I did a quick scan through the codebase and I believe the offending line is here

https://github.com/omnisci/pymapd/blob/eb262b39a9c2057beef181f32ebeee2358359624/pymapd/_pandas_loaders.py#L161

randyzwitch commented 5 years ago

Thanks for reporting @sglyon. I'll take a look at the git blame for this and see why it might have been added. Mutating the data frame is also pretty undesirable.

randyzwitch commented 5 years ago

Turns out, commenting out those lines makes it work, so I'll clean this up into a PR and the fix will be in our next version (pymapd 0.10)

sglyon commented 5 years ago

Great, thanks.

I also deleted the lines in my installs after posting the issue and haven't seen any problems.