pangeo-data / pangeo-example-notebooks

Pangeo Example Notebooks
105 stars 53 forks source link

machine-learning.ipynb on http://pangeo.pydata.org RuntimeError #1

Open raybellwaves opened 6 years ago

raybellwaves commented 6 years ago

Tried to run the the cell

from sklearn.externals import joblib

with joblib.parallel_backend('dask', scatter=[X, y]):
    grid_search.fit(X, y)

and got the output (it's long...) Possibly the RuntimeError: Joblib backend requires eitherjoblib>= '0.10.2' orsklearn> '0.17.1'. Please install or upgrade is the main issue?

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-8-085d2322fa37> in <module>()
      2 
      3 with joblib.parallel_backend('dask', scatter=[X, y]):
----> 4     grid_search.fit(X, y)

/opt/conda/lib/python3.6/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
    637                                   error_score=self.error_score)
    638           for parameters, (train, test) in product(candidate_params,
--> 639                                                    cv.split(X, y, groups)))
    640 
    641         # if one choose to see train score, "out" will contain train score info

/opt/conda/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
    787                 # consumption.
    788                 self._iterating = False
--> 789             self.retrieve()
    790             # Make sure that we get a last message telling us we are done
    791             elapsed_time = time.time() - self._start_time

/opt/conda/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in retrieve(self)
    699                     self._output.extend(job.get(timeout=self.timeout))
    700                 else:
--> 701                     self._output.extend(job.get())
    702 
    703             except BaseException as exception:

/opt/conda/lib/python3.6/site-packages/distributed/joblib.py in get()
    249 
    250         def get():
--> 251             return ref().result()
    252 
    253         future.get = get # monkey patch to achieve AsyncResult API

/opt/conda/lib/python3.6/site-packages/distributed/client.py in result(self, timeout)
    190                                   raiseit=False)
    191         if self.status == 'error':
--> 192             six.reraise(*result)
    193         elif self.status == 'cancelled':
    194             raise result

/opt/conda/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
    690                 value = tp()
    691             if value.__traceback__ is not tb:
--> 692                 raise value.with_traceback(tb)
    693             raise value
    694         finally:

/opt/conda/lib/python3.6/site-packages/distributed/protocol/pickle.py in loads()
     57 def loads(x):
     58     try:
---> 59         return pickle.loads(x)
     60     except Exception:
     61         logger.info("Failed to deserialize %s", x[:10000], exc_info=True)

/opt/conda/lib/python3.6/site-packages/distributed/joblib.py in <module>()
     38     _bases.append(ParallelBackendBase)
     39 if not _bases:
---> 40     raise RuntimeError("Joblib backend requires either `joblib` >= '0.10.2' "
     41                        " or `sklearn` > '0.17.1'. Please install or upgrade")
     42 

RuntimeError: Joblib backend requires either `joblib` >= '0.10.2'  or `sklearn` > '0.17.1'. Please install or upgrade
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6d45c7b8>, <Future finished exception=CancelledError(['_fit_and_score-batch-c8bc3da59762435bb023dded3c77fb1c'],)>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
    result = yield _wait([future])
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
    raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-c8bc3da59762435bb023dded3c77fb1c']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6d459f28>, <Future finished exception=CancelledError(['_fit_and_score-batch-c4ce3d7618034bec8f259a15b9b99b3f'],)>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
    result = yield _wait([future])
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
    raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-c4ce3d7618034bec8f259a15b9b99b3f']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6e527620>, <Future finished exception=CancelledError(['_fit_and_score-batch-4ca1e7b762c44a0d930e15f6c6a981f9'],)>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
    result = yield _wait([future])
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
    raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-4ca1e7b762c44a0d930e15f6c6a981f9']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6edb52f0>, <Future finished exception=CancelledError(['_fit_and_score-batch-29b5dd78588d448a8eb6e33d0d7400ca'],)>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
    result = yield _wait([future])
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
    raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-29b5dd78588d448a8eb6e33d0d7400ca']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6fddf950>, <Future finished exception=CancelledError(['_fit_and_score-batch-c0c51b4512904a449c9cd169b95b749e'],)>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
    result = yield _wait([future])
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
    raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-c0c51b4512904a449c9cd169b95b749e']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6edb11e0>, <Future finished exception=CancelledError(['_fit_and_score-batch-50ac41eee8364dcbb7b42e46ef9b0912'],)>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
    result = yield _wait([future])
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
    raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-50ac41eee8364dcbb7b42e46ef9b0912']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6ed93378>, <Future finished exception=CancelledError(['_fit_and_score-batch-c20e4a9fc8654ae290286dbe6fab8c14'],)>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
    result = yield _wait([future])
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
    raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-c20e4a9fc8654ae290286dbe6fab8c14']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6d45e048>, <Future finished exception=CancelledError(['_fit_and_score-batch-eea80eb9ac67456abbc3f6ab66742105'],)>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
    result = yield _wait([future])
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
    raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-eea80eb9ac67456abbc3f6ab66742105']
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f4f6e527e18>, <Future finished exception=CancelledError(['_fit_and_score-batch-f9de1c20b4034245968ae293f0296956'],)>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tornado/ioloop.py", line 780, in _discard_future_result
    future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1107, in run
    yielded = self.gen.throw(*exc_info)
  File "/opt/conda/lib/python3.6/site-packages/distributed/joblib.py", line 241, in callback_wrapper
    result = yield _wait([future])
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1099, in run
    value = future.result()
  File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 1113, in run
    yielded = self.gen.send(value)
  File "/opt/conda/lib/python3.6/site-packages/distributed/client.py", line 3346, in _wait
    raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['_fit_and_score-batch-f9de1c20b4034245968ae293f0296956']
martindurant commented 6 years ago

@TomAugspurger, does this mean anything to you, perhaps a joblib/sklearn release schedule thing?

TomAugspurger commented 6 years ago

Did you import dask_ml.joblib, or import distributed.joblib first?

raybellwaves commented 6 years ago

The imports (in order) throughout the notebook are:

from dask_kubernetes import KubeCluster
from dask.distributed import Client, progress
import dask_ml.joblib  # register the distriubted backend
from sklearn.datasets import make_classification
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
import pandas as pd
from sklearn.externals import joblib
TomAugspurger commented 6 years ago

Thanks, that would have raised a different error anyway.

Will take a look later.

dazzag24 commented 6 years ago

Seeing same issue in example notebooks

TomAugspurger commented 6 years ago

Hopefully fixed by https://github.com/pangeo-data/helm-chart/pull/51

You could maybe work around it by adding dask-ml to the worker-template.yaml, something like

    env:
      - name: EXTRA_CONDA_PACKAGES
        value: dask-ml

for now, but that isn't a long-term solution.

rsignell-usgs commented 6 years ago

This machine learning notebook is working fine on our http://pangeo.esipfed.org instance using this Dockerfile based solely on conda-forge.

TomAugspurger commented 6 years ago

Strange, as the worker dockerfile doesn't include dask-ml: https://github.com/rsignell-usgs/helm-chart/blob/94ca64191b9e4ab12ba455852c2ed85a915cd51b/docker-images/worker/Dockerfile

My diagnosis may be incorrect then.

On Mon, Jul 23, 2018 at 4:05 PM Rich Signell notifications@github.com wrote:

This machine learning notebook is working fine on our http://pangeo.esipfed.org instance using this Dockerfile based solely on conda-forge https://github.com/rsignell-usgs/helm-chart/blob/conda-forge/docker-images/notebook/Dockerfile .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/pangeo-example-notebooks/issues/1#issuecomment-407200884, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIhj07ZcE9JbmSDwWxFOZX7McXtV_ks5uJjqEgaJpZM4U29xb .

TomAugspurger commented 6 years ago

Ah, of course my diagnosis is incorrect, since the example doesn't actually require dask-ml, just scikit-learn and distributed.

I'll do some further debugging...

rsignell-usgs commented 6 years ago

@TomAugspurger, we actually are using the notebook image for the workers too, so that old worker Dockerfile is misleading. The notebook environment contains dask-ml, which is required by the example notebook.