vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second šŸš€
https://vaex.io
MIT License
8.31k stars 591 forks source link

[BUG-REPORT] Jupyter Notebook kernel hanging when running basic data aggregations #1398

Open mtwichan opened 3 years ago

mtwichan commented 3 years ago

Description I'm trying to run the Vaex tutorial in a Jupyter Notebook and the Jupyter Notebook is hanging/freezing anytime I run an aggregation. I would really appreciate any help. I tested the code snippets below on two operating systems with the same results.

Code snippet (run in Jupyter Notebook) pulled from here:

import vaex
df = vaex.example()
df.x
df.x.values

import numpy as np
np.sqrt(df.x**2 + df.y**2 + df.z**2)

df['r'] = np.sqrt(df.x**2 + df.y**2 + df.z**2) # freezes here
df[['x', 'y', 'z', 'r']]

Another example of the notebook freezing (run in Jupyter Notebook):

import vaex
# 107 GB dataset
df = vaex.open('s3://vaex/taxi/yellow_taxi_2009_2015_f32.hdf5?anon=true')
mean = df.mean(df.passenger_count)

Software information

Additional information Please state any supplementary information or provide additional context for the problem (e.g. screenshots, data, etc..).

Python Version: Python 3.9.2 (Windows) & Python 3.7.2 (MacOS) requirements.txt (relevant libraries):

ipydatawidgets==4.2.0
ipykernel==5.5.0
ipyleaflet==0.13.6
ipympl==0.7.0
ipython==7.20.0
ipython-genutils==0.2.0
ipyvolume==0.5.2
ipyvue==1.5.0
ipyvuetify==1.6.2
ipywebrtc==0.6.0
ipywidgets==7.6.3
...
jupyter==1.0.0
jupyter-client==6.1.11
jupyter-console==6.2.0
jupyter-core==4.7.1
jupyter-dash==0.4.0
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.0
...
vaex==4.2.0
vaex-astro==0.8.1
vaex-core==4.2.0
vaex-hdf5==0.7.0
vaex-jupyter==0.6.0
vaex-ml==0.12.0
vaex-server==0.4.1
vaex-viz==0.5.0
...
JovanVeljanoski commented 3 years ago

Hi,

Can you tell us what version of numpy do you have? Btw does your machine have an SSD or a HDD?

Also, are you sure that the datasets (the example and especially the taxi) have been downloaded successfully. It might take a while to just download the data before any computations are done..

mtwichan commented 3 years ago

Hi @JovanVeljanoski,

My machine (Windows) has an SSD.

I'm using numpy==1.19.5.

With regards to downloading the data successfully, I'm fetching the data from s3 or from the sample data provided with the library all within Jupyter Notebook. The cell appears to run successfully (see image below). Where should I be looking to confirm that the data has completed downloading?

jupyter_notebook

maartenbreddels commented 3 years ago

import vaex df = vaex.example() df.x df.x.values

import numpy as np np.sqrt(df.x2 + df.y2 + df.z**2)

df['r'] = np.sqrt(df.x2 + df.y2 + df.z**2) # freezes here

If it already freezes here I don't think we should look at the next example... can you also try it from a simple python console?

mtwichan commented 3 years ago

@maartenbreddels it seems to be working when run as a Python script and in the Python console.

Python console example:

Python 3.9.2 (tags/v3.9.2:1a79785, Feb 19 2021, 13:44:55) [MSC v.1928 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import vaex
>>> df = vaex.example()
>>> import numpy as np
>>> df['r'] = np.sqrt(df.x**2 + df.y**2 + df.z**2) # freezes here
>>> print(df[['x', 'y', 'z', 'r']])
#        x             y            z            r
0        1.2318684     -0.39692867  -0.59805775  1.4257367
1        -0.16370061   3.6542213    -0.25490645  3.6667573
2        -2.120256     3.3260527    1.7078403    4.298236
3        4.715589      4.585251     2.2515438    6.9520326
4        7.217187      11.994717    -1.0645622   14.039028
...      ...           ...          ...          ...
329,995  1.9938701     0.7892761    0.2220599    2.1558723
329,996  3.7180912     0.7213376    1.6415337    4.127852
329,997  0.36885077    13.029609    -3.6339347   13.531897
329,998  -0.112592645  1.4529126    2.1689527    2.6130419
329,999  20.79622      -3.3313878   12.188416    24.333895
>>>

Looks like it's a Jupyter notebook issue than? I've tried running the Jupyter notebook with jupyter notebook and ipython notebook with the same results, not sure if this makes a difference.

Thanks for the help by the way!

CC: @kully

mtwichan commented 3 years ago

@maartenbreddels I'm not sure why it does not run correctly on our machines, so I've switched to Google Colaboratory and it works great!

JovanVeljanoski commented 3 years ago

if you are running multiple environments, you can check if the jupyter notebook you are running is from the same environment in which you installed vaex.

If it is works in the console (ipython) like you've shown above, my bet is on something like that.

Vaex and jupyter are for sure compatible - I use that combination daily. Just make sure that you are in the same environment!

mtwichan commented 3 years ago

Hi @JovanVeljanoski,

I spoke with @maartenbreddels about this bug via video call. If you folks need any more assistance from me, I'm happy to help!

maartenbreddels commented 3 years ago

Could you give us your full output of pip freeze? Cheers

(from mobile phone)

On Tue, Jun 29, 2021, 21:58 Matthew @.***> wrote:

Hi @JovanVeljanoski https://github.com/JovanVeljanoski,

I spoke with @maartenbreddels https://github.com/maartenbreddels about this bug via video call. If you folks need any more assistance from me, I'm happy to help!

ā€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vaexio/vaex/issues/1398#issuecomment-870874440, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANPEPJ44EXPXPP6YIOQNX3TVIQVTANCNFSM46PSDVBA .

mtwichan commented 3 years ago

@maartenbreddels here you go!

adal==1.2.7
adlfs==0.7.7
aiobotocore==1.3.0
aiohttp==3.7.4.post0
aioitertools==0.7.1
alabaster==0.7.12
alembic==1.5.8
amqp==5.0.6
ansi2html==1.6.0
aplus==0.11.0
appdirs==1.4.4
arabic-reshaper==2.1.3
argon2-cffi==20.1.0
asgiref==3.3.4
aspy.refactor-imports==2.1.1
astropy==4.2.1
async-generator==1.10
async-timeout==3.0.1
atomicwrites==1.4.0
attrs==20.3.0
Authlib==0.15.3
awsebcli==3.19.3
azure-common==1.1.27
azure-core==1.15.0
azure-datalake-store==0.0.52
azure-identity==1.6.0
azure-mgmt-core==1.2.2
azure-mgmt-storage==18.0.0
azure-storage-blob==12.8.1
Babel==2.9.0
backcall==0.2.0
bandit==1.7.0
billiard==3.6.4.0
black==20.8b1
bleach==3.3.0
bokeh==2.3.2
Bootstrap-Flask==1.5.2
boto3==1.17.92
botocore==1.20.92
bqplot==0.12.27
branca==0.4.2
Brotli==1.0.9
cached-property==1.5.2
cachetools==4.2.2
cairocffi==1.2.0
CairoSVG==2.5.2
can-decoder==0.1.1
celery==5.0.5
cement==2.8.2
certifi==2020.12.5
cffi==1.14.5
cfgv==3.2.0
chardet==3.0.4
click==7.1.2
click-didyoumean==0.0.3
click-plugins==1.1.1
click-repl==0.1.6
cloudpickle==1.6.0
colorama==0.4.3
colorcet==2.0.6
colorlover==0.3.0
cryptography==3.4.7
cssselect2==0.4.1
cycler==0.10.0
dash==1.20.0
dash-bootstrap-components==0.12.0
dash-core-components==1.16.0
dash-cytoscape==0.3.0
dash-design-kit==1.6.2
dash-enterprise-auth==0.0.4
dash-html-components==1.1.3
dash-renderer==1.9.1
dash-table==4.11.3
dask==2021.5.0
datashader==0.13.0
datashape==0.5.2
ddtrace==0.48.0
decorator==4.4.2
defusedxml==0.6.0
distlib==0.3.1
distributed==2021.5.0
Django==3.2
dnspython==2.1.0
docutils==0.16
dominate==2.6.0
email-validator==1.1.2
entrypoints==0.3
et-xmlfile==1.0.1
filelock==3.0.12
flake8==3.8.4
Flask==1.1.2
Flask-Assets==2.0
Flask-Caching==1.10.1
Flask-Compress==1.9.0
Flask-Login==0.5.0
Flask-Migrate==2.7.0
Flask-SQLAlchemy==2.5.1
Flask-WTF==0.14.3
frozendict==2.0.2
fsspec==2021.6.0
future==0.16.0
gitdb==4.0.5
GitPython==3.1.13
gql==2.0.0
graphql-core==2.3.2
greenlet==1.0.0
gunicorn==20.0.4
h5py==3.2.1
HeapDict==1.0.1
holoviews==1.14.4
html5lib==1.1
identify==1.5.14
idna==2.10
image==1.5.33
imagesize==1.2.0
iniconfig==1.1.1
ipydatawidgets==4.2.0
ipykernel==5.5.0
ipyleaflet==0.13.6
ipympl==0.7.0
ipython==7.20.0
ipython-genutils==0.2.0
ipyvolume==0.5.2
ipyvue==1.5.0
ipyvuetify==1.6.2
ipywebrtc==0.6.0
ipywidgets==7.6.3
isodate==0.6.0
itsdangerous==1.1.0
jdcal==1.4.1
jedi==0.18.0
Jinja2==2.11.3
jmespath==0.10.0
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.1.11
jupyter-console==6.2.0
jupyter-core==4.7.1
jupyter-dash==0.4.0
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.0
kiwisolver==1.3.1
kombu==5.0.2
llvmlite==0.36.0
locket==0.2.1
Mako==1.1.4
Markdown==3.3.4
MarkupSafe==1.1.1
matplotlib==3.4.1
mccabe==0.6.1
mdf-iter==0.0.4
mistune==0.8.4
msal==1.11.0
msal-extensions==0.3.0
msgpack==1.0.2
msrest==0.6.21
multidict==5.1.0
multipledispatch==0.6.0
mypy-extensions==0.4.3
mysql==0.0.2
mysql-connector-python==8.0.24
mysqlclient==2.0.3
nbclient==0.5.2
nbconvert==6.0.7
nbformat==5.1.2
nest-asyncio==1.5.1
nodeenv==1.5.0
notebook==6.4.0
numba==0.53.1
numpy==1.20.3
oauthlib==3.1.1
openpyxl==3.0.6
packaging==20.9
pandas==1.1.5
pandas-market-calendars==1.6.1
pandocfilters==1.4.3
panel==0.11.3
param==1.10.1
parso==0.8.1
partd==1.2.0
pathspec==0.5.9
patsy==0.5.1
pbr==5.5.1
pickleshare==0.7.5
Pillow==8.2.0
plotly==4.14.3
pluggy==0.13.1
portalocker==1.7.1
pre-commit==2.10.1
progressbar2==3.53.1
prometheus-client==0.9.0
promise==2.3
prompt-toolkit==3.0.16
protobuf==3.15.8
psutil==5.8.0
py==1.10.0
pyarrow==4.0.0
pyasn1==0.4.8
pycodestyle==2.6.0
pycparser==2.20
pyct==0.4.8
pyerfa==2.0.0
pyflakes==2.2.0
Pygments==2.8.0
PyJWT==2.0.1
pyOpenSSL==20.0.1
pyparsing==2.4.7
PyPDF2==1.26.0
Pyphen==0.10.0
pypiwin32==223
pyrsistent==0.17.3
pytest==6.2.2
python-bidi==0.4.2
python-dateutil==2.8.1
python-dotenv==0.17.0
python-editor==1.0.4
python-utils==2.5.6
pythreejs==2.3.0
pytz==2021.1
pyviz-comms==2.0.2
pywin32==300
pywinpty==0.5.7
PyYAML==5.3.1
pyzmq==22.0.3
qtconsole==5.0.2
QtPy==1.9.0
redis==3.5.3
regex==2020.11.13
reorder-python-imports==2.4.0
reportlab==3.5.67
requests==2.24.0
requests-oauthlib==1.3.0
retrying==1.3.3
rsa==4.7.2
Rx==1.6.1
s3fs==2021.6.0
s3transfer==0.4.2
scipy==1.6.3
semantic-version==2.5.0
Send2Trash==1.5.0
Shapely==1.7.1
six==1.14.0
smart-open==5.1.0
smmap==3.0.5
snowballstemmer==2.1.0
sortedcontainers==2.4.0
Sphinx==3.4.3
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==1.0.3
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.4
SQLAlchemy==1.4.9
sqlparse==0.4.1
statsmodels==0.12.2
stevedore==3.3.0
tabulate==0.8.9
tblib==1.7.0
tenacity==7.0.0
termcolor==1.1.0
terminado==0.9.2
testpath==0.4.4
tinycss2==1.1.0
toml==0.10.2
toolz==0.11.1
tornado==6.1
tqdm==4.61.1
trading-calendars==2.1.1
traitlets==5.0.5
traittypes==0.2.1
typed-ast==1.4.2
typing-extensions==3.7.4.3
urllib3==1.25.11
vaex==4.2.0
vaex-astro==0.8.1
vaex-core==4.2.0
vaex-hdf5==0.7.0
vaex-jupyter==0.6.0
vaex-ml==0.12.0
vaex-server==0.4.1
vaex-viz==0.5.0
vine==5.0.0
virtualenv==20.4.2
visitor==0.1.3
wcwidth==0.1.9
WeasyPrint==52.4
webassets==2.0
webencodings==0.5.1
Werkzeug==1.0.1
widgetsnbextension==3.5.1
wrapt==1.12.1
WTForms==2.3.3
xarray==0.18.2
xhtml2pdf==0.2.5
xlrd==2.0.1
yarl==1.6.3
zict==2.0.0