Open korosig opened 2 years ago
Hi @korosig thank you for the feedback! Checking the videos is clear that there is some delay (and nice ending comic by the way) when executing inside Spyder. However, I'm not totally sure what could be happening :/
Is this happening for you after certain amount of time after launching Spyder? Has any of your variables a considerable size?
Also, is there any sample script that you can share with us to reproduce this problem in our side?
Any new info in order to reproduce this is greatly appreciated, let us know!
Hi, here is a little script with the error. The parquet could be any kind of parquet data. (I have created a brand new Conda environment with Python 3.8.2, and IPython 7.29.0)
import pandas as pd
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
import numpy as np
import dask.dataframe as dd
import dask
from datetime import timedelta
import json
timedelta(hours=200)
df = dd.read_parquet('dataset/flat_table1116.pq/')
timedelta(hours=200)
imported packages pandas 1.3.4 tensorflow 2.3.0 numpy 1.20.3 dask 2021.10.0 json 2.0.9
The video: With Spyder https://youtu.be/XLkxrfZeQwc With Jupyter https://youtu.be/9VVMNxpo2rI With vanilla Python https://youtu.be/XdQlICPXv-g
I have downgraded the dask. dask==2021.01.1 Spyder broke-down dask==2021.02.0 Spyder broke-down dask==2021.03.0 Spyder slow down ...... dask==2021.10.0 Spyder slow down
Hey @korosig, you said:
The parquet could be any kind of parquet data.
I thing the problem is precisely related to the size of the dataframe associated to that parquet file. Could you provide us with a parquet file of roughly the size the one you're using so we can run tests with it? Its contents are not important, just its size.
Hey @korosig, you said:
The parquet could be any kind of parquet data.
I thing the problem is precisely related to the size of the dataframe associated to that parquet file. Could you provide us with a parquet file of roughly the size the one you're using so we can run tests with it? Its contents are not important, just its size.
~30 GB
I thing the problem is precisely related to the size of the dataframe ....
if this is true, do you have idea why did this not cause a problem in the other IDE
~30 GB
Oh wow! Then that's almost surely the problem. As I said, if you can provide us with such a file, we'll try to fix the problem.
if this is true, do you have idea why did this not cause a problem in the other IDE
I think that has to do with the Variable Explorer. That's because each time code is evaluated in the console, we need to generate the representation of variables on it to display it in the Variable Explorer. And that can take a lot of time for such a big dataframe.
I think that has to do with the Variable Explorer. That's because each time code is evaluated in the console, we need to generate the representation of variables on it to display it in the Variable Explorer. And that can take a lot of time for such a big dataframe.
This means..... I could use DASK to handle big data frames, but Spyder doesn't support this (yet)
Oh wow! Then that's almost surely the problem. As I said, if you can provide us with such a file, we'll try to fix the problem.
I have tried with smaller parquet and it works without any delay.
Does it mean DASK + BIG parquet is not compatible with Spyder?
I have tried with smaller parquet and it works without any delay.
Ok, thanks for the confirmation.
Does it mean DASK + BIG parquet is not compatible with Spyder?
I'd say it is, it's just annoying to wait for the console prompt to come back after each evaluation.
But seriously, I think to fix this we should give up computing the representation I mentioned after a timeout, and only show those results we could compute before it. We'll try to do that in the coming months.
Note: Here is a code to generate a large dataframe with Dask: https://coiled.io/blog/introducing-the-dask-active-memory-manager/
Hi there, I have installed a new Anaconda 2021.11 with a new Spyder 5.1.5 on my Windows 10 because the latest Sypder4.2.5 broke down.
When I open the new Spyder 5.1.5 it is working well, but then usually slow down, didn't refresh the variable size in the variable explorer, and a simple +,-,/ take 10-30 seconds..... Do you have an idea to solve that problem?
I have uninstalled previews Anaconda, deleted all python related folders, etc. My machine has 64GB Ram, iCore9, SSD 1T, 2x 3080 11GB Nvida
Syder IDE 5.15| Python 3.9.7 64-bit | Qt 5.9.7 | PyQt5 5.9.2 | Windows 10
Here is the video about the issue: https://youtu.be/mLzyZIW19GQ the same code in Jupyter https://youtu.be/UBSXuL4VihM Dependencies
Mandatory: atomicwrites >=1.2.0 : 1.4.0 (OK) chardet >=2.0.0 : 4.0.0 (OK) cloudpickle >=0.5.0 : 2.0.0 (OK) cookiecutter >=1.6.0 : 1.7.2 (OK) diff_match_patch >=20181111 : 20200713 (OK) intervaltree >=3.0.2 : 3.1.0 (OK) IPython >=7.6.0 : 7.29.0 (OK) jedi >=0.17.2;<0.19.0 : 0.18.0 (OK) jsonschema >=3.2.0 : 3.2.0 (OK) keyring >=17.0.0 : 23.1.0 (OK) nbconvert >=4.0 : 6.1.0 (OK) numpydoc >=0.6.0 : 1.1.0 (OK) paramiko >=2.4.0 : 2.7.2 (OK) parso >=0.7.0;<0.9.0 : 0.8.2 (OK) pexpect >=4.4.0 : 4.8.0 (OK) pickleshare >=0.4 : 0.7.5 (OK) psutil >=5.3 : 5.8.0 (OK) pygments >=2.0 : 2.10.0 (OK) pylint >=2.5.0;<2.10.0 : 2.9.6 (OK) pyls_spyder >=0.4.0 : 0.4.0 (OK) pylsp >=1.2.2;<1.3.0 : 1.2.4 (OK) pylsp_black >=1.0.0 : None (OK) qdarkstyle =3.0.2 : 3.0.2 (OK) qstylizer >=0.1.10 : 0.1.10 (OK) qtawesome >=1.0.2 : 1.0.2 (OK) qtconsole >=5.1.0 : 5.1.1 (OK) qtpy >=1.5.0 : 1.10.0 (OK) rtree >=0.9.7 : 0.9.7 (OK) setuptools >=49.6.0 : 58.0.4 (OK) sphinx >=0.6.6 : 4.2.0 (OK) spyder_kernels >=2.1.1;<2.2.0 : 2.1.3 (OK) textdistance >=4.2.0 : 4.2.1 (OK) three_merge >=0.1.1 : 0.1.1 (OK) watchdog >=0.10.3 : 2.1.3 (OK) zmq >=17 : 22.2.1 (OK)
Optional: cython >=0.21 : 0.29.24 (OK) matplotlib >=2.0.0 : 3.4.3 (OK) numpy >=1.7 : 1.20.3 (OK) pandas >=1.1.1 : 1.3.4 (OK) scipy >=0.17.0 : 1.7.1 (OK) sympy >=0.7.3 : 1.9 (OK)