spyder-ide / spyder

Official repository for Spyder - The Scientific Python Development Environment
https://www.spyder-ide.org
MIT License
8.35k stars 1.62k forks source link

Already executed Dask tasks get re-executed in Spyder #18434

Closed bsesar closed 2 years ago

bsesar commented 2 years ago

Issue Report Checklist

Problem Description

For some reason, Spyder triggers re-execution of Dask tasks that already finished. This behavior is not present when executing the code in Python or IPython. Restarting the Dask cluster does not remove these tasks from memory and they keep re-executing. The only way to remove tasks is to restart Spyder.

What steps reproduce the problem?

import pandas as pd
import dask.dataframe as dd
from dask.distributed import Client
import numpy as np

# start a local Dask cluster
client = Client()

# execute these blocks associated with df and df2 Dask DataFrames
df = dd.from_pandas(pd.DataFrame({'a':np.arange(10000000), 'b':np.arange(10000000)}), npartitions=100)
df = df.set_index('a')
df.to_parquet('test')

df2 = dd.from_pandas(pd.DataFrame({'a':np.arange(100000000), 'b':np.arange(100000000)}), npartitions=200)
df2 = df2.set_index('a')
df2.to_parquet('test2')

# open the Dask dashboard by using this URL in a browser: http://localhost:8787/status
# observe the dashboard as you execute the line below
# (you may need to have the browser and Spyder side by side to see the tasks appear in the dashboard)
df2.head()

# the above command should not trigger execution of tasks related to the df Dask DataFrame, but it does,
# as evident by the appearance (i.e., execution) of 100 from_pandas and len_chunk tasks associated with the df Dask DataFrame

What is the expected output? What do you see instead?

I expect only the relevant Dask code to be executed. When executing the above code in Python or IPython, tasks associated with df Dask DataFrame do not get executed.

Versions

Dependencies

dask=2022.6.0
dalthviz commented 2 years ago

Hi @bsesar thank you for the feedback! I'm not totally sure but maybe this could be caused due to the retrieval of the kernel namespace when executing something to show it from in the Variable Explorer. Pinging @ccordoba12 and @impact27 (maybe them have some ideas about what could be happening here)

bsesar commented 2 years ago

Hi @dalthviz. Is it possible to somehow turn off Variable Explorer? I find the above behavior quite annoying.

Could the above issue be related to the fix for #16844? Back in 2020 I reported an issue (#14265) related to Dask and Variable Explorer and that issue was fixed in 5.3.0.

dalthviz commented 2 years ago

Thanks for the info and references @bsesar ! Those were the issues I had in mind when thinking about possible causes for this.

To turn off the Variable Explorer you can go to Preferences > Plugins and uncheck Variable explorer:

imagen

After applying the settings you will need to restart Spyder:

imagen

Let us know if that helps!

bsesar commented 2 years ago

Hi @dalthviz. After I turned off Variable Explorer, the unwanted triggering of Dask tasks stopped. Thanks! :-)

dalthviz commented 2 years ago

Glad the workaround worked for you @bsesar !

dalthviz commented 2 years ago

Note: The call to the kernel triggering Dask tasks is caused by a call to get_var_properties when doing a call to refresh_namespacebrowser (which is called after any console execution)