heavyai / pymapd

Python client for OmniSci GPU-accelerated SQL engine and analytics platform
https://pymapd.readthedocs.io/en/latest/
Apache License 2.0
111 stars 50 forks source link

pymapd example doesn't work under pip or conda-forge #79

Closed randyzwitch closed 6 years ago

randyzwitch commented 6 years ago

In trying to do the example here, a community user gets an error via pip install:

In [1]: import pandas as pd  
   ...: import sys  
   ...: from pymapd import connect
   ...: 
   ...: con = connect(user="mapd", password="HyperInteractive", host="localhost", dbn
   ...: ame="mapd") 
   ...: 
   ...: 

In [2]: df = con.select_ipc("""select CAST(nppes_provider_zip5 as INT) as zipcode,
   ...: sum(total_claim_count) as total_claims,
   ...: sum(opioid_claim_count) as opioid_claims from cms_prescriber 
   ...: group by 1 order by opioid_claims desc limit 100""")
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-2-b23320627e1e> in <module>()
      2 sum(total_claim_count) as total_claims,
      3 sum(opioid_claim_count) as opioid_claims from cms_prescriber
----> 4 group by 1 order by opioid_claims desc limit 100""")

~/miniconda3/lib/python3.6/site-packages/pymapd/connection.py in select_ipc(self, operation, parameters, first_n)
    296             raise ImportError("pandas is required for `select_ipc`")
    297 
--> 298         from .shm import load_buffer
    299 
    300         if parameters is not None:

ModuleNotFoundError: No module named 'pymapd.shm'

Unfortunately, installing pymapd via conda-forge gives a different error (using separate conda env for both):

(condainstall) mapdadmin@MapDCE:~$ ipython
Python 3.6.5 | packaged by conda-forge | (default, Apr  6 2018, 13:39:56) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.3.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pandas as pd  
   ...: import sys  
   ...: from pymapd import connect
   ...: 
   ...: con = connect(user="mapd", password="HyperInteractive", host="localhost", dbname="mapd")  
   ...: 
   ...: 

In [2]: import pandas as pd  
   ...: import sys  
   ...: from pymapd import connect
   ...: 
   ...: con = connect(user="mapd", password="HyperInteractive", host="localhost", dbname="mapd")  
   ...: 
   ...: 

In [3]: prescriber_df = pd.read_csv("data/PartD_Prescriber_PUF_NPI_15.txt", sep='\t', low_memory=False)  
   ...: 

In [4]: str_cols = prescriber_df.columns[prescriber_df.dtypes==object]  
   ...: prescriber_df[str_cols] = prescriber_df[str_cols].fillna('NA')  
   ...: prescriber_df.fillna(0,inplace=True)
   ...: 
   ...: 

In [5]: con.execute('drop table if exists cms_prescriber')  
   ...: con.create_table("cms_prescriber",prescriber_df, preserve_index=False)  
   ...: %time con.load_table("cms_prescriber", prescriber_df, preserve_index=False)
   ...: 
   ...: 
---------------------------------------------------------------------------
TMapDException                            Traceback (most recent call last)
<timed eval> in <module>()

~/miniconda3/envs/condainstall/lib/python3.6/site-packages/pymapd/connection.py in load_table(self, table_name, data, method, preserve_index, create)
    418         if method == 'infer':
    419             if (_is_pandas(data) or _is_arrow(data)) and _HAS_ARROW:
--> 420                 return self.load_table_arrow(table_name, data)
    421 
    422             elif _is_pandas(data):

~/miniconda3/envs/condainstall/lib/python3.6/site-packages/pymapd/connection.py in load_table_arrow(self, table_name, data, preserve_index)
    520                                            preserve_index=preserve_index)
    521         self._client.load_table_binary_arrow(self._session, table_name,
--> 522                                              payload.to_pybytes())
    523 
    524 

~/miniconda3/envs/condainstall/lib/python3.6/site-packages/mapd/MapD.py in load_table_binary_arrow(self, session, table_name, arrow_stream)
   1614         """
   1615         self.send_load_table_binary_arrow(session, table_name, arrow_stream)
-> 1616         self.recv_load_table_binary_arrow()
   1617 
   1618     def send_load_table_binary_arrow(self, session, table_name, arrow_stream):

~/miniconda3/envs/condainstall/lib/python3.6/site-packages/mapd/MapD.py in recv_load_table_binary_arrow(self)
   1638         iprot.readMessageEnd()
   1639         if result.e is not None:
-> 1640             raise result.e
   1641         return
   1642 

TMapDException: TMapDException(error_msg='Expected a single Arrow record batch. Import aborted')
randyzwitch commented 6 years ago

https://community.mapd.com/t/https-www-mapd-com-blog-mapd-pandas-arrow/1198

wamsiv commented 6 years ago

For 1) If you are installing via pip you need to install pyximport which is part of cython to load .shm files. It comes by default in conda builds. Then import it along pymapd import pyximport; pyximport.install()

For 2) Need to dig into the issue, for now, you can use rowwise loader: con.load_table("cms_prescriber", prescriber_df.itertuples(index=False))

randyzwitch commented 6 years ago

@wamsiv can't install pyximport:

(comm_20180502) mapdadmin@MapDCE:~$ pip install pyximport pymapd
Collecting pyximport
  Could not find a version that satisfies the requirement pyximport (from versions: )
No matching distribution found for pyximport
wamsiv commented 6 years ago

pyximport is a part of cython library. Install cython by:pip install cython.

randyzwitch commented 6 years ago

It is:

(comm_20180502) mapdadmin@MapDCE:~$ pip install cython
Requirement already satisfied: cython in ./miniconda3/lib/python3.6/site-packages (0.28.2)
TomAugspurger commented 6 years ago

Do you have a log of the pip install, and do you know what platform?

ModuleNotFoundError: No module named 'pymapd.shm'

Suggest that the package wasn't installed properly. PyPI only has the source distribution (not wheels), so to use any of the ipc stuff will require compilers.

Also, pip install pymapd doesn't bring in everything. For IPC, you'll need pip install pymapd[arrow].

TomAugspurger commented 6 years ago

For the conda environment, what version of arrow do you have? It seems like we test against 0.7.1, but the conda-forge recipe requires >= 0.5.0: https://github.com/conda-forge/pymapd-feedstock/blob/master/recipe/meta.yaml#L30

randyzwitch commented 6 years ago

Here's my pip list, this is on Ubuntu 16.04LTS Server

(comm_20180502) mapdadmin@MapDCE:~$ pip list
Package          Version  
---------------- ---------
asn1crypto       0.24.0   
backcall         0.1.0    
certifi          2018.4.16
cffi             1.11.5   
chardet          3.0.4    
conda            4.5.1    
cryptography     2.2.2    
Cython           0.28.2   
decorator        4.3.0    
idna             2.6      
ipython          6.3.1    
ipython-genutils 0.2.0    
jedi             0.12.0   
numpy            1.14.3   
pandas           0.22.0   
parso            0.2.0    
pexpect          4.5.0    
pickleshare      0.7.4    
pip              10.0.1   
prompt-toolkit   1.0.15   
ptyprocess       0.5.2    
pyarrow          0.9.0    
pycosat          0.6.3    
pycparser        2.18     
Pygments         2.2.0    
pyOpenSSL        17.5.0   
PySocks          1.6.8    
python-dateutil  2.7.2    
pytz             2018.4   
requests         2.18.4   
ruamel-yaml      0.15.35  
setuptools       39.0.1   
simplegeneric    0.8.1    
six              1.11.0   
SQLAlchemy       1.2.7    
thrift           0.10.0   
traitlets        4.3.2    
urllib3          1.22     
wcwidth          0.1.7    
wheel            0.31.0   

Here's conda:

mapdadmin@MapDCE:~$ source activate condainstall
(condainstall) mapdadmin@MapDCE:~$ conda list
# packages in environment at /home/mapdadmin/miniconda3/envs/condainstall:
#
# Name                    Version                   Build  Channel
arrow-cpp                 0.9.0                    py36_7    conda-forge
backcall                  0.1.0                    py36_0  
boost-cpp                 1.66.0                        1    conda-forge
bzip2                     1.0.6                         1    conda-forge
ca-certificates           2018.03.07                    0  
certifi                   2018.4.16                py36_0  
decorator                 4.3.0                    py36_0  
icu                       58.2                          0    conda-forge
intel-openmp              2018.0.0                      8  
ipython                   6.3.1                    py36_0  
ipython_genutils          0.2.0            py36hb52b0d5_0  
jedi                      0.12.0                   py36_1  
libgcc-ng                 7.2.0                hdf63c60_3  
libgfortran-ng            7.2.0                hdf63c60_3  
mkl                       2018.0.2                      1  
mkl_fft                   1.0.2                    py36_0    conda-forge
mkl_random                1.0.1                    py36_0    conda-forge
ncurses                   5.9                          10    conda-forge
numpy                     1.14.2           py36hdbf6ddf_1  
openssl                   1.0.2o               h20670df_0  
pandas                    0.22.0                   py36_1    conda-forge
parquet-cpp               1.4.0                         0    conda-forge
parso                     0.2.0                    py36_0  
pexpect                   4.5.0                    py36_0  
pickleshare               0.7.4            py36h63277f8_0  
pip                       9.0.3                    py36_0    conda-forge
prompt_toolkit            1.0.15           py36h17d85b1_0  
ptyprocess                0.5.2            py36h69acd42_0  
pyarrow                   0.9.0                    py36_1    conda-forge
pygments                  2.2.0            py36h0d3125c_0  
pymapd                    0.3.2                    py36_0    conda-forge
python                    3.6.5                         1    conda-forge
python-dateutil           2.7.2                      py_0    conda-forge
pytz                      2018.4                     py_0    conda-forge
readline                  7.0                           0    conda-forge
setuptools                39.1.0                   py36_0    conda-forge
simplegeneric             0.8.1                    py36_2  
six                       1.11.0                   py36_1    conda-forge
sqlalchemy                1.2.7            py36h65ede16_0    conda-forge
sqlite                    3.20.1                        2    conda-forge
thrift                    0.10.0                   py36_0    conda-forge
tk                        8.6.7                         0    conda-forge
traitlets                 4.3.2            py36h674d592_0  
wcwidth                   0.1.7            py36hdf4376a_0  
wheel                     0.31.0                   py36_0    conda-forge
xz                        5.2.3                         0    conda-forge
zlib                      1.2.11                        0    conda-forge
TomAugspurger commented 6 years ago

A simple pip install pymapd[arrow] failed for me.

building 'pymapd.shm' extension
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/numpy/numpy/core/include -I/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/pyarrow/include -I/usr/local/include -I/usr/local/opt/openssl/include -I/usr/local/opt/sqlite/include -I/Users/taugspurger/Envs/pandas-dev/include -I/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/include/python3.6m -c pymapd/shm.cpp -o build/temp.macosx-10.13-x86_64-3.6/pymapd/shm.o -std=c++11
In file included from pymapd/shm.cpp:611:
In file included from /Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/numpy/numpy/core/include/numpy/arrayobject.h:4:
In file included from /Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/numpy/numpy/core/include/numpy/ndarrayobject.h:18:
In file included from /Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/numpy/numpy/core/include/numpy/ndarraytypes.h:1821:
/Users/taugspurger/Envs/pandas-dev/lib/python3.6/site-packages/numpy/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning:
      "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it by " \
 ^
1 warning generated.
clang++ -bundle -undefined dynamic_lookup build/temp.macosx-10.13-x86_64-3.6/pymapd/shm.o -L/usr/local/lib -L/usr/local/opt/openssl/lib -L/usr/local/opt/sqlite/lib -larrow -larrow_python -o build/lib.macosx-10.13-x86_64-3.6/pymapd/shm.cpython-36m-darwin.so -std=c++11
ld: library not found for -larrow
clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: command 'clang++' failed with exit status 1

Seems something with finding arrow is going wrong. Won't have time to look closely till later.

TomAugspurger commented 6 years ago

For the conda env, if you try pyarrow==0.7.1 things will hopefully work.

randyzwitch commented 6 years ago

Here's what happened by downgrading pyarrow

(condainstall) mapdadmin@MapDCE:~$ conda install -c conda-forge pyarrow==0.7.1
Solving environment: done

==> WARNING: A newer version of conda exists. <==
  current version: 4.5.1
  latest version: 4.5.2

Please update conda by running

    $ conda update -n base conda

## Package Plan ##

  environment location: /home/mapdadmin/miniconda3/envs/condainstall

  added / updated specs: 
    - pyarrow==0.7.1

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2018.4.16          |           py36_0         142 KB  conda-forge
    arrow-cpp-0.7.1            |           py36_2         2.5 MB  conda-forge
    parquet-cpp-1.3.0.post     |                2         1.3 MB  conda-forge
    pyarrow-0.7.1              |           py36_1         885 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.8 MB

The following packages will be UPDATED:

    ca-certificates: 2018.03.07-0                  --> 2018.4.16-0      conda-forge
    certifi:         2018.4.16-py36_0              --> 2018.4.16-py36_0 conda-forge
    openssl:         1.0.2o-h20670df_0             --> 1.0.2o-0         conda-forge

The following packages will be DOWNGRADED:

    arrow-cpp:       0.9.0-py36_7      conda-forge --> 0.7.1-py36_2     conda-forge
    parquet-cpp:     1.4.0-0           conda-forge --> 1.3.0.post-2     conda-forge
    pyarrow:         0.9.0-py36_1      conda-forge --> 0.7.1-py36_1     conda-forge

Proceed ([y]/n)? y

Is this the same issue as this: https://github.com/mapd/pymapd/pull/67

randyzwitch commented 6 years ago

@TomAugspurger is correct, downgrading pyarrow did the trick:


Downloading and Extracting Packages
certifi 2018.4.16########################################################################################## | 100% 
arrow-cpp 0.7.1############################################################################################ | 100% 
parquet-cpp 1.3.0.post##################################################################################### | 100% 
pyarrow 0.7.1############################################################################################## | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(condainstall) mapdadmin@MapDCE:~$ ipython
Python 3.6.5 | packaged by conda-forge | (default, Apr  6 2018, 13:39:56) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.3.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pandas as pd  
   ...: import sys  
   ...: from pymapd import connect
   ...: 
   ...: con = connect(user="mapd", password="HyperInteractive", host="localhost", dbname="mapd")
   ...: 
   ...: 

In [2]: prescriber_df = pd.read_csv("data/PartD_Prescriber_PUF_NPI_15.txt", sep='\t', low_memory=False)  
   ...: 

In [3]: str_cols = prescriber_df.columns[prescriber_df.dtypes==object]  
   ...: prescriber_df[str_cols] = prescriber_df[str_cols].fillna('NA')  
   ...: prescriber_df.fillna(0,inplace=True)
   ...: 
   ...: 

In [4]: con.execute('drop table if exists cms_prescriber')  
   ...: con.create_table("cms_prescriber",prescriber_df, preserve_index=False)
   ...: 
   ...: 

In [5]: con.load_table("cms_prescriber", prescriber_df, preserve_index=False)
TomAugspurger commented 6 years ago

For the conda-forge issue, yes this would be the same as #67. There would need to be a new release on conda-forge that pins the pyarrow dep to 0.7.1.

The pip failure seems to be different, but I haven't looked closely.

randyzwitch commented 6 years ago

This appears to have been resolved, via fresh install into a conda py3.6 environment on Ubuntu 16.04