stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction
Apache License 2.0
1.64k stars 214 forks source link

Error Pickling #152

Closed ryan-wolbeck closed 4 years ago

ryan-wolbeck commented 4 years ago

Hey @alejandroschuler, I'm getting a weird issue when trying to pickle within a docker container.

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/src/app/ngboost_tuner/__main__.py", line 16, in <module>
    main()
  File "/usr/src/app/ngboost_tuner/__main__.py", line 10, in main
    args.func(args)
  File "/usr/src/app/ngboost_tuner/tune.py", line 133, in run
    pickle.dump(ngb, open(f"{path}ngbtest.p", "wb"))
AttributeError: Can't pickle local object 'manifold.<locals>.Manifold'

My code to get this is:

ngb = NGBRegressor(**best_params).fit(
        x.values,
        y.values,
        X_val=x_valid.values,
        Y_val=y_valid.values,
        early_stopping_rounds=2,
    )
    log.info("Finished training the final model, running diagnostics")

    Y_pred = ngb.predict(x_test)
    Mae = median_absolute_error(y_test, Y_pred)
    log.info(f"Median Absolute Error = {Mae}")

    mea = mean_absolute_error(y_test, Y_pred)
    log.info(f"Mean Absolute Error = {mea}")

    log.info("Saving the model file")

    path = os.path.expanduser("/usr/src/app/models/")

    if not os.path.exists(path):
        os.mkdir(path)

    pickle.dump(ngb, open(f"{path}ngbtest.p", "wb"))

    log.info(f"Model saved to: {file_path}")

Could very well be something I'm doing but I'm a bit lost since it looks like the pattern in the jupyter docs.

Thanks, Ryan

ryan-wolbeck commented 4 years ago

Also made an attempt at using joblib instead.

from joblib import dump
dump(ngb, f"{path}ngbtest.p")

Result

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/src/app/ngboost_tuner/__main__.py", line 16, in <module>
    main()
  File "/usr/src/app/ngboost_tuner/__main__.py", line 10, in main
    args.func(args)
  File "/usr/src/app/ngboost_tuner/tune.py", line 143, in run
    dump(ngb, f"{path}ngbtest.p")
  File "/usr/local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 480, in dump
    NumpyPickler(f, protocol=protocol).dump(value)
  File "/usr/local/lib/python3.7/pickle.py", line 437, in dump
    self.save(obj)
  File "/usr/local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 282, in save
    return Pickler.save(self, obj)
  File "/usr/local/lib/python3.7/pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/local/lib/python3.7/pickle.py", line 662, in save_reduce
    save(state)
  File "/usr/local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 282, in save
    return Pickler.save(self, obj)
  File "/usr/local/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/local/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/usr/local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 282, in save
    return Pickler.save(self, obj)
  File "/usr/local/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python3.7/pickle.py", line 1016, in save_type
    return self.save_global(obj)
  File "/usr/local/lib/python3.7/pickle.py", line 960, in save_global
    (obj, module_name, name)) from None
_pickle.PicklingError: Can't pickle <class 'ngboost.manifold.manifold.<locals>.Manifold'>: it's not found as ngboost.manifold.manifold.<locals>.Manifold
alejandroschuler commented 4 years ago

Interesting... are you using the most recent development version from github?

ryan-wolbeck commented 4 years ago

@alejandroschuler I'm pip installing ngboost==0.2.0

ryan-wolbeck commented 4 years ago

@alejandroschuler changed my requirements.txt to

pandas==1.0.3
git+https://github.com/stanfordmlgroup/ngboost.git
hyperopt==0.2.4

and it seems to work but not tying to a specific version makes me a little nervous

alejandroschuler commented 4 years ago

Yeah, the pickling thing was a bug that only got fixed in the most recent version that hasn't been added to pip.

tbh I'm a statistician and not a software engineer so a lot of things about the ngboost codebase and development pipeline could use some help. Is there not a way to install a specific version from github? Or would we need to add version tags or something in git? If you're interested and knowledgeable and willing to work on this kind of stuff I'd be happy to add you to the core dev team.

ryan-wolbeck commented 4 years ago

Yeah I think pushing a new version out is probably appropriate. The last one was Feb 9th for 0.2.0 and I think most people wouldn't install via cloning git on package like this. I'd be happy to help dig in on this further, I'm not a SE either (Data Scientist) however I do about 90% engineering for my professional roles so I think I could bring some value to the team.

tonyduan commented 4 years ago

Thanks for pointing this out. I've just pushed v0.2.1 to PyPI.

I agree that the current workflow isn't ideal (I need to manually bump the version number and upload to PyPI at the moment). Happy to take any suggestions etc.

Would also appreciate if you could double check the pickling behavior against the latest version on PyPI.

ryan-wolbeck commented 4 years ago

@tonyduan I ran with 0.2.1 with no issues, thanks for the update!