TypeError: cannot pickle '_thread.RLock' object #25

Open jdwillard19 opened 2 years ago

jdwillard19 commented 2 years ago

This following code does not execute for me and gives the stack trace shown after. It appears I cannot even declare dask storage on a LocalCUDACluster.

from dask.distributed import Client
import dask.distributed
import dask_optuna
import joblib
from dask_cuda import LocalCUDACluster

def lstm_HPO_dask(timeout=100, hpo_type=None,n_jobs=5):

        if __name__ == '__main__':
            cluster = LocalCUDACluster()
            with Client(cluster) as client:

                storage = dask_optuna.DaskStorage(cluster)

                study = optuna.create_study(storage=storage,direction='minimize')

                # Optimize in parallel on your Dask cluster
                with joblib.parallel_backend("dask"):
                    study.optimize(lstm_cv_obj, n_trials=100, n_jobs=n_jobs)

                print(f"best_params = {study.best_params}")
        return None



Stack trace below

-> storage = dask_optuna.DaskStorage(cluster) (Pdb) c Traceback (most recent call last): File "/global/homes/j/jwillard/miniconda3/envs/stml/lib/python3.9/site-packages/distributed/protocol/pickle.py", line 49, in dumps result = pickle.dumps(x, **dump_kwargs) TypeError: cannot pickle '_thread.RLock' object

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/global/u2/j/jwillard/stream-temperature-ml/workflows/hpc/to_submit/lstm_hpo_regional_dask.py", line 162, in lstm_HPO_dask(timeout=N_SECONDS,n_jobs=n_jobs) File "/global/u2/j/jwillard/stream-temperature-ml/workflows/hpc/to_submit/lstm_hpo_regional_dask.py", line 147, in lstm_HPO_dask storage = dask_optuna.DaskStorage(cluster) File "/global/homes/j/jwillard/miniconda3/envs/stml/lib/python3.9/site-packages/dask_optuna/storage.py", line 325, in init self.client.run_on_scheduler( File "/global/homes/j/jwillard/miniconda3/envs/stml/lib/python3.9/site-packages/distributed/client.py", line 2406, in run_on_scheduler return self.sync(self._run_on_scheduler, function, *args, kwargs) File "/global/homes/j/jwillard/miniconda3/envs/stml/lib/python3.9/site-packages/distributed/client.py", line 860, in sync return sync( File "/global/homes/j/jwillard/miniconda3/envs/stml/lib/python3.9/site-packages/distributed/utils.py", line 326, in sync raise exc.with_traceback(tb) File "/global/homes/j/jwillard/miniconda3/envs/stml/lib/python3.9/site-packages/distributed/utils.py", line 309, in f result[0] = yield future File "/global/homes/j/jwillard/miniconda3/envs/stml/lib/python3.9/site-packages/tornado/gen.py", line 762, in run value = future.result() File "/global/homes/j/jwillard/miniconda3/envs/stml/lib/python3.9/site-packages/distributed/client.py", line 2368, in _run_on_scheduler kwargs=dumps(kwargs, protocol=4), File "/global/homes/j/jwillard/miniconda3/envs/stml/lib/python3.9/site-packages/distributed/protocol/pickle.py", line 60, in dumps result = cloudpickle.dumps(x, dump_kwargs) File "/global/homes/j/jwillard/miniconda3/envs/stml/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps

jakirkham commented 2 years ago

Does the same thing happen when using LocalCluster from distributed?

jdwillard19 commented 2 years ago

Yes, exact same error using dask.distributed.LocalCluster()