Open raybellwaves opened 6 years ago
@TomAugspurger, does this mean anything to you, perhaps a joblib/sklearn release schedule thing?
Did you import dask_ml.joblib
, or import distributed.joblib
first?
The imports (in order) throughout the notebook are:
from dask_kubernetes import KubeCluster
from dask.distributed import Client, progress
import dask_ml.joblib # register the distriubted backend
from sklearn.datasets import make_classification
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
import pandas as pd
from sklearn.externals import joblib
Thanks, that would have raised a different error anyway.
Will take a look later.
Seeing same issue in example notebooks
Hopefully fixed by https://github.com/pangeo-data/helm-chart/pull/51
You could maybe work around it by adding dask-ml to the worker-template.yaml
, something like
env:
- name: EXTRA_CONDA_PACKAGES
value: dask-ml
for now, but that isn't a long-term solution.
This machine learning notebook is working fine on our http://pangeo.esipfed.org instance using this Dockerfile based solely on conda-forge.
Strange, as the worker dockerfile doesn't include dask-ml: https://github.com/rsignell-usgs/helm-chart/blob/94ca64191b9e4ab12ba455852c2ed85a915cd51b/docker-images/worker/Dockerfile
My diagnosis may be incorrect then.
On Mon, Jul 23, 2018 at 4:05 PM Rich Signell notifications@github.com wrote:
This machine learning notebook is working fine on our http://pangeo.esipfed.org instance using this Dockerfile based solely on conda-forge https://github.com/rsignell-usgs/helm-chart/blob/conda-forge/docker-images/notebook/Dockerfile .
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/pangeo-example-notebooks/issues/1#issuecomment-407200884, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIhj07ZcE9JbmSDwWxFOZX7McXtV_ks5uJjqEgaJpZM4U29xb .
Ah, of course my diagnosis is incorrect, since the example doesn't actually require dask-ml
, just scikit-learn and distributed.
I'll do some further debugging...
@TomAugspurger, we actually are using the notebook image for the workers too, so that old worker Dockerfile is misleading. The notebook environment contains dask-ml, which is required by the example notebook.
Tried to run the the cell
and got the output (it's long...) Possibly the
RuntimeError: Joblib backend requires either
joblib>= '0.10.2' or
sklearn> '0.17.1'. Please install or upgrade
is the main issue?