vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.25k stars 590 forks source link

[BUG-REPORT]"Unknown variables or column: ' while using jit #2183

Closed GMfatcat closed 2 years ago

GMfatcat commented 2 years ago

Description Hi, I got this "Unknown variables or column: ' error while using jit_numba() / jit_cuda() I am just trying a simplify version of your jit turtorial guide, and the code as below:

df = vaex.example()

def arc_distance(theta_1, phi_1, theta_2, phi_2):
    """
    Calculates the pairwise arc distance
    between all points in vector a and b.
    """
    temp = (np.sin((theta_2-2-theta_1)/2)**2
           + np.cos(theta_1)*np.cos(theta_2) * np.sin((phi_2-phi_1)/2)**2)
    distance_matrix = 2 * np.arctan2(np.sqrt(temp), np.sqrt(1-temp))
    return distance_matrix

#without jit
df['arc_distance'] = arc_distance(df.x * np.pi/180,  
                                    df.y * np.pi/180,
                                    df.z * np.pi/180,
                                    df.vx * np.pi/180)

df.mean(df.arc_distance) # works fine here

df['arc_distance_cuda'] = df.arc_distance.jit_numba()  # **Errorr here**
df.mean(df.arc_distance_cuda)

image

Software information

I didn't encounter this problem in my another env (in python 3.8), and I think the package version aren't too much different to current env. Though @jit acceleration isn't a must-need function for me(at least for now), I still want to know how to avoid these mistake.

JovanVeljanoski commented 2 years ago

Hi,

I can't reproduce your issues - things are working fine on current master (under linux, i don't have access to a windows machine).

image

Can you check if you don't have any other bugs in your code, for example i see you call this one variable arc_distance_cuda, but the JIT is done via numba.. maybe there is something there (just guessing tho, could be also fine).

Also please use code formatting next time, it is harder to read the code like this..

I also tried running your code in a Kaggle notebook, and it also works..

GMfatcat commented 2 years ago

Thanks for your reply, and what I mean "error while using jit_numba() / jit_cuda()" is that I got the same error either using numba or cuda, sorry for that I only give numba example for the code. Maybe there are some dependency conflict between pip package and conda package, I'll create a new env for vaex only and try again.

GMfatcat commented 2 years ago

It works with these setups (including vaex and geopandas), I'l leave it below in case some Window user encounter this, copy and make it to a env.yaml:

name: vaex_geo
channels:
  - conda-forge
  - defaults
dependencies:
  - anyio=3.6.1=pyhd8ed1ab_1
  - argon2-cffi=21.3.0=pyhd8ed1ab_0
  - argon2-cffi-bindings=21.2.0=py38h294d835_2
  - asttokens=2.0.8=pyhd8ed1ab_0
  - attrs=22.1.0=pyh71513ae_1
  - babel=2.10.3=pyhd8ed1ab_0
  - backcall=0.2.0=pyh9f0ad1d_0
  - backports=1.0=py_2
  - backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
  - beautifulsoup4=4.11.1=pyha770c72_0
  - bleach=5.0.1=pyhd8ed1ab_0
  - blosc=1.21.1=h74325e0_3
  - boost-cpp=1.74.0=h9f4b32c_8
  - branca=0.5.0=pyhd8ed1ab_0
  - brotli=1.0.9=h8ffe710_7
  - brotli-bin=1.0.9=h8ffe710_7
  - brotlipy=0.7.0=py38h294d835_1004
  - bzip2=1.0.8=h8ffe710_4
  - ca-certificates=2022.6.15=h5b45459_0
  - cairo=1.16.0=h0ac17fb_1012
  - certifi=2022.6.15=py38haa244fe_0
  - cffi=1.15.1=py38hd8c33c5_0
  - cfitsio=4.1.0=h5a969a9_0
  - charset-normalizer=2.1.1=pyhd8ed1ab_0
  - click=8.1.3=py38haa244fe_0
  - click-plugins=1.1.1=py_0
  - cligj=0.7.2=pyhd8ed1ab_1
  - colorama=0.4.5=pyhd8ed1ab_0
  - console_shortcut=0.1.1=4
  - cryptography=37.0.1=py38h21b164f_0
  - curl=7.83.1=h789b8ee_0
  - cycler=0.11.0=pyhd8ed1ab_0
  - debugpy=1.6.3=py38h885f38d_0
  - decorator=5.1.1=pyhd8ed1ab_0
  - defusedxml=0.7.1=pyhd8ed1ab_0
  - entrypoints=0.4=pyhd8ed1ab_0
  - executing=0.10.0=pyhd8ed1ab_0
  - expat=2.4.8=h39d44d4_0
  - fiona=1.8.21=py38h4ea64ce_2
  - flit-core=3.7.1=pyhd8ed1ab_0
  - folium=0.12.1.post1=pyhd8ed1ab_1
  - font-ttf-dejavu-sans-mono=2.37=hab24e00_0
  - font-ttf-inconsolata=3.000=h77eed37_0
  - font-ttf-source-code-pro=2.038=h77eed37_0
  - font-ttf-ubuntu=0.83=hab24e00_0
  - fontconfig=2.14.0=hce3cb01_0
  - fonts-conda-ecosystem=1=0
  - fonts-conda-forge=1=0
  - fonttools=4.36.0=py38h294d835_0
  - freetype=2.12.1=h546665d_0
  - freexl=1.0.6=ha8e266a_0
  - gdal=3.5.1=py38h84437df_4
  - geopandas=0.11.1=pyhd8ed1ab_0
  - geopandas-base=0.11.1=pyha770c72_0
  - geos=3.11.0=h39d44d4_0
  - geotiff=1.7.1=h714bc5f_3
  - gettext=0.19.8.1=ha2e2712_1008
  - hdf4=4.2.15=h0e5069d_4
  - hdf5=1.12.2=nompi_h57737ce_100
  - icu=70.1=h0e60522_0
  - idna=3.3=pyhd8ed1ab_0
  - importlib-metadata=4.11.4=py38haa244fe_0
  - importlib_metadata=4.11.4=hd8ed1ab_0
  - importlib_resources=5.9.0=pyhd8ed1ab_0
  - intel-openmp=2022.1.0=h57928b3_3787
  - ipykernel=6.15.1=pyh025b116_0
  - ipython=8.4.0=py38haa244fe_0
  - ipython_genutils=0.2.0=py_1
  - jedi=0.18.1=pyhd8ed1ab_2
  - jinja2=3.1.2=pyhd8ed1ab_1
  - joblib=1.1.0=pyhd8ed1ab_0
  - jpeg=9e=h8ffe710_2
  - json5=0.9.5=pyh9f0ad1d_0
  - jsonschema=4.14.0=pyhd8ed1ab_0
  - jupyter_client=7.3.4=pyhd8ed1ab_0
  - jupyter_core=4.11.1=py38haa244fe_0
  - jupyter_server=1.18.1=pyhd8ed1ab_0
  - jupyterlab=3.4.5=pyhd8ed1ab_0
  - jupyterlab_pygments=0.2.2=pyhd8ed1ab_0
  - jupyterlab_server=2.15.0=pyhd8ed1ab_0
  - kealib=1.4.15=hdf81f3a_1
  - kiwisolver=1.4.4=py38hbd9d945_0
  - krb5=1.19.3=hc8ab02b_0
  - lcms2=2.12=h2a16943_0
  - lerc=4.0.0=h63175ca_0
  - libblas=3.9.0=16_win64_mkl
  - libbrotlicommon=1.0.9=h8ffe710_7
  - libbrotlidec=1.0.9=h8ffe710_7
  - libbrotlienc=1.0.9=h8ffe710_7
  - libcblas=3.9.0=16_win64_mkl
  - libcurl=7.83.1=h789b8ee_0
  - libdeflate=1.13=h8ffe710_0
  - libffi=3.4.2=h8ffe710_5
  - libgdal=3.5.1=h44c0759_4
  - libglib=2.72.1=h3be07f2_0
  - libiconv=1.16=he774522_0
  - libkml=1.3.0=h9859afa_1014
  - liblapack=3.9.0=16_win64_mkl
  - libnetcdf=4.8.1=nompi_h85765be_104
  - libpng=1.6.37=h1d00b33_4
  - libpq=14.5=h1ea2d34_0
  - librttopo=1.1.0=h2842628_11
  - libsodium=1.0.18=h8d14728_1
  - libspatialindex=1.9.3=h39d44d4_4
  - libspatialite=5.0.1=ha17912d_18
  - libsqlite=3.39.2=h8ffe710_1
  - libssh2=1.10.0=h9a1e1f7_3
  - libtiff=4.4.0=h92677e6_3
  - libwebp-base=1.2.4=h8ffe710_0
  - libxcb=1.13=hcd874cb_1004
  - libxml2=2.9.14=hf5bbc77_4
  - libxslt=1.1.35=h34f844d_0
  - libzip=1.9.2=h519de47_1
  - libzlib=1.2.12=h8ffe710_2
  - lxml=4.9.1=py38h294d835_0
  - lz4-c=1.9.3=h8ffe710_1
  - m2w64-gcc-libgfortran=5.3.0=6
  - m2w64-gcc-libs=5.3.0=7
  - m2w64-gcc-libs-core=5.3.0=7
  - m2w64-gmp=6.1.0=2
  - m2w64-libwinpthread-git=5.0.0.4634.697f757=2
  - mapclassify=2.4.3=pyhd8ed1ab_0
  - markupsafe=2.1.1=py38h294d835_1
  - matplotlib-base=3.5.3=py38he529843_1
  - matplotlib-inline=0.1.6=pyhd8ed1ab_0
  - mistune=2.0.4=pyhd8ed1ab_0
  - mkl=2022.1.0=h6a75c08_874
  - msys2-conda-epoch=20160418=1
  - munch=2.5.0=py_0
  - munkres=1.1.4=pyh9f0ad1d_0
  - nbclassic=0.4.3=pyhd8ed1ab_0
  - nbclient=0.6.7=pyhd8ed1ab_0
  - nbconvert=7.0.0=pyhd8ed1ab_0
  - nbconvert-core=7.0.0=pyhd8ed1ab_0
  - nbconvert-pandoc=7.0.0=pyhd8ed1ab_0
  - nbformat=5.4.0=pyhd8ed1ab_0
  - nest-asyncio=1.5.5=pyhd8ed1ab_0
  - networkx=2.8.6=pyhd8ed1ab_0
  - notebook=6.4.12=pyha770c72_0
  - notebook-shim=0.1.0=pyhd8ed1ab_0
  - openjpeg=2.4.0=hb211442_1
  - openssl=3.0.5=h8ffe710_1
  - packaging=21.3=pyhd8ed1ab_0
  - pandas=1.4.3=py38hcc40339_0
  - pandoc=2.19.2=h57928b3_0
  - pandocfilters=1.5.0=pyhd8ed1ab_0
  - parso=0.8.3=pyhd8ed1ab_0
  - patsy=0.5.2=pyhd8ed1ab_0
  - pcre=8.45=h0e60522_0
  - pickleshare=0.7.5=py_1003
  - pillow=9.2.0=py38hd8e0db4_1
  - pip=22.2.2=pyhd8ed1ab_0
  - pixman=0.40.0=h8ffe710_0
  - pkgutil-resolve-name=1.3.10=pyhd8ed1ab_0
  - poppler=22.04.0=h24fffdf_1
  - poppler-data=0.4.11=hd8ed1ab_0
  - postgresql=14.5=he353ca9_0
  - proj=9.0.1=h1cfcee9_1
  - prometheus_client=0.14.1=pyhd8ed1ab_0
  - prompt-toolkit=3.0.30=pyha770c72_0
  - psutil=5.9.1=py38h294d835_0
  - pthread-stubs=0.4=hcd874cb_1001
  - pure_eval=0.2.2=pyhd8ed1ab_0
  - pycparser=2.21=pyhd8ed1ab_0
  - pygments=2.13.0=pyhd8ed1ab_0
  - pyopenssl=22.0.0=pyhd8ed1ab_0
  - pyparsing=3.0.9=pyhd8ed1ab_0
  - pyproj=3.3.1=py38hf6b4ca6_1
  - pyrsistent=0.18.1=py38h294d835_1
  - pysocks=1.7.1=py38haa244fe_5
  - python=3.8.13=hcf16a7b_0_cpython
  - python-dateutil=2.8.2=pyhd8ed1ab_0
  - python-fastjsonschema=2.16.1=pyhd8ed1ab_0
  - python_abi=3.8=2_cp38
  - pytz=2022.2.1=pyhd8ed1ab_0
  - pywin32=303=py38h294d835_0
  - pywinpty=2.0.7=py38hd3f51b4_0
  - pyzmq=23.2.1=py38h09162b1_0
  - requests=2.28.1=pyhd8ed1ab_0
  - rtree=1.0.0=py38h8b54edf_1
  - scikit-learn=1.1.2=py38hc27f28a_0
  - scipy=1.9.0=py38h91810f7_0
  - seaborn=0.11.2=hd8ed1ab_0
  - seaborn-base=0.11.2=pyhd8ed1ab_0
  - send2trash=1.8.0=pyhd8ed1ab_0
  - setuptools=65.2.0=py38haa244fe_0
  - shapely=1.8.4=py38h91759cc_0
  - six=1.16.0=pyh6c4a22f_0
  - snappy=1.1.9=h82413e6_1
  - sniffio=1.2.0=py38haa244fe_3
  - soupsieve=2.3.2.post1=pyhd8ed1ab_0
  - sqlite=3.39.2=h8ffe710_1
  - stack_data=0.4.0=pyhd8ed1ab_0
  - statsmodels=0.13.2=py38hbdcd294_0
  - tbb=2021.5.0=h2d74725_1
  - terminado=0.15.0=py38haa244fe_0
  - threadpoolctl=3.1.0=pyh8a188c0_0
  - tiledb=2.11.0=h3132609_1
  - tinycss2=1.1.1=pyhd8ed1ab_0
  - tk=8.6.12=h8ffe710_0
  - tornado=6.2=py38h294d835_0
  - traitlets=5.3.0=pyhd8ed1ab_0
  - typing_extensions=4.3.0=pyha770c72_0
  - ucrt=10.0.20348.0=h57928b3_0
  - unicodedata2=14.0.0=py38h294d835_1
  - urllib3=1.26.11=pyhd8ed1ab_0
  - vc=14.2=hb210afc_6
  - vs2015_runtime=14.29.30037=h902a5da_6
  - wcwidth=0.2.5=pyh9f0ad1d_2
  - webencodings=0.5.1=py_1
  - websocket-client=1.3.3=pyhd8ed1ab_0
  - wheel=0.37.1=pyhd8ed1ab_0
  - win_inet_pton=1.1.0=py38haa244fe_4
  - winpty=0.4.3=4
  - xerces-c=3.2.3=h0e60522_5
  - xorg-libxau=1.0.9=hcd874cb_0
  - xorg-libxdmcp=1.1.3=hcd874cb_0
  - xyzservices=2022.6.0=pyhd8ed1ab_0
  - xz=5.2.6=h8d14728_0
  - zeromq=4.3.4=h0e60522_1
  - zipp=3.8.1=pyhd8ed1ab_0
  - zlib=1.2.12=h8ffe710_2
  - zstd=1.5.2=h7755175_4
  - pip:
    - aplus==0.11.0
    - astropy==5.1
    - blake3==0.3.1
    - bqplot==0.12.34
    - cachetools==5.2.0
    - cloudpickle==2.1.0
    - commonmark==0.9.1
    - dask==2022.8.1
    - fastapi==0.80.0
    - filelock==3.8.0
    - frozendict==2.3.4
    - fsspec==2022.7.1
    - future==0.18.2
    - h11==0.13.0
    - h5py==3.7.0
    - httptools==0.4.0
    - ipydatawidgets==4.3.1.post1
    - ipyleaflet==0.17.1
    - ipympl==0.9.2
    - ipyvolume==0.5.2
    - ipyvue==1.7.0
    - ipyvuetify==1.8.2
    - ipywebrtc==0.6.0
    - ipywidgets==8.0.1
    - jupyter-resource-usage==0.6.1
    - jupyterlab-widgets==3.0.2
    - llvmlite==0.39.0
    - locket==1.0.0
    - numba==0.56.0
    - numpy==1.22.4
    - partd==1.3.0
    - progressbar2==4.0.0
    - pyarrow==9.0.0
    - pydantic==1.9.2
    - pyerfa==2.0.0.1
    - python-dotenv==0.20.0
    - python-utils==3.3.3
    - pythreejs==2.3.0
    - pyyaml==6.0
    - rich==12.5.1
    - starlette==0.19.1
    - tabulate==0.8.10
    - toolz==0.12.0
    - traittypes==0.2.1
    - uvicorn==0.18.2
    - vaex==4.11.1
    - vaex-astro==0.9.1
    - vaex-core==4.11.1
    - vaex-hdf5==0.12.3
    - vaex-jupyter==0.8.0
    - vaex-ml==0.18.0
    - vaex-server==0.8.1
    - vaex-viz==0.5.2
    - watchfiles==0.16.1
    - websockets==10.3
    - widgetsnbextension==4.0.2
    - xarray==2022.6.0
prefix: C:\Users\user\anaconda3\envs\geo
JovanVeljanoski commented 2 years ago

So the jit_cuda requires you to have a GPU and install the cuda dependencies (cudf) or so..

GMfatcat commented 2 years ago

So the jit_cuda requires you to have a GPU and install the cuda dependencies (cudf) or so..

Yeah I know that, but I don't want to setup cuda in my laptop, using numba is good enough for my laptop, thanks for the reminder.