h2oai / h2o4gpu

H2Oai GPU Edition
Apache License 2.0
456 stars 96 forks source link

Update xgboost (had to fixup conflicts) and lightgbm (no conflicts) #821

Closed pseudotensor closed 4 years ago

pseudotensor commented 4 years ago

No issues for lgbm, but xgboost has these files different between upstream/master and the h2oai branch:

git diff --name-only upstream/master

... (skip new Docker/Makefiles)

include/xgboost/c_api.h
include/xgboost/gbm.h
include/xgboost/learner.h
include/xgboost/model_visitor.h
python-package/setup.py
python-package/xgboost/callback.py
python-package/xgboost/core.py
python-package/xgboost/sklearn.py
python-package/xgboost/training.py
requirements_build.txt
requirements_runtime.txt
src/CMakeLists.txt
src/analysis/xgbfi.cc
src/analysis/xgbfi.h
src/c_api/c_api.cc
src/common/common.cu
src/common/common.h
src/gbm/gblinear.cc
src/gbm/gblinear_model.h
src/gbm/gbtree.h
src/gbm/gbtree_model.h
src/learner.cc
src/objective/regression_obj.cu
src/tree/tree_updater.cc
src/tree/updater_gpu_common.cuh
src/tree/updater_gpu_hist.cu
tests/ci_build/tidy.py
tests/cpp/objective/test_regression_obj.cc
tests/cpp/tree/test_gpu_hist.cu
tests/python-gpu/test_gpu_updaters1gpu.py
tests/python/test_early_stopping.py
tests/python/test_plotting.py
tests/python/test_with_sklearn.py
tests/python/test_xgbfi.py

Need to check if differences all make sense.

pseudotensor commented 4 years ago

@sh1ng please check out:

git clone https://github.com/h2oai/xgboost
cd xgboost
git checkout h2oai
git diff upstream/master                                                                                                                                                  

Maybe we can avoid differences in common.cu, common.h, gblinear.cc, updater_gpu_common*, etc.

Unsure about:

image

@nkalonia1 This required still?

I left merge conflict unresolved as committed since I don't know what to do yet.

sh1ng commented 4 years ago

I'm out of context for https://github.com/h2oai/xgboost/commit/07744d1ddd3a2cdca89252e7a8743a4aa6fb9008

but using xgboost's version reasonable for files

xgboost/src/common/common.cu xgboost/src/common/common.h xgboost/src/tree/updater_gpu_common.cuh seems like the changes were not merged
https://github.com/dmlc/xgboost/commit/310fe60b35a400164a0817442aac64506d83ee6c#diff-a6c65348790350dbbbc7882c0fc0fff8

+++ b/tests/ci_build/tidy.py
@@ -173,7 +173,7 @@ class ClangTidy(object):
             self.compile_commands = json.load(fd)
         tidy_file = os.path.join(self.root_path, '.clang-tidy')
         with open(tidy_file) as fd:
-            self.clang_tidy = yaml.safe_load(fd)
+            self.clang_tidy = yaml.load(fd)
             self.clang_tidy = str(self.clang_tidy)
         all_files = []
         for entry in self.compile_commands:

accept it also

sh1ng commented 4 years ago

Failing tests are related to changes in pickle format(in xgboost 1.0). We need to create and save model_saved-1.0.pkl

pseudotensor commented 4 years ago

I had already adjusted what to keep in common, but it may be ok to remove those lines. They were required in order to load an old pickle that had been saved on GPU but loaded on CPU-only system. Pickle load would fail early, and I don't know that they ever fixed it. Too risky to remove IMO. Will recover if manual testing shows issues.

pseudotensor commented 4 years ago

Need to add migration safety to xgboost, since they changed the object saved and check a header. Old pickles aren't the same as new, but may be possible to still load old ones if knew what the header should be. Also possible things are totally different, and cannot load.

thirdwing commented 4 years ago

Please make sure we upgrade to xgboost 1.0.0.

We need the json support.

sh1ng commented 4 years ago

I've created PoC of h2o4gpu with 2 versions of xgboost(to load pickle into an old one save_model and load_model in a new one).

2 - xgboost 1.0 but #1 - xgboost 0.9. and it's inside dmlc-code.

The failing peace of code https://github.com/dmlc/dmlc-core/blob/552f7de748fbff34f2708b03f930a47ded45d78e/src/io.cc#L137 uses singleton. Downgrading dmlc version(to have them match) fixes seg faults.

I can also ask possible solutions in the community, but before doing that I just want to know your opinions. Or should we go with first option?

@pseudotensor @thirdwing

thirdwing commented 4 years ago

I will suggest using 1.0.0, which has the official json support.

sh1ng commented 4 years ago

@thirdwing we will use 1.0 there's no doubts. I'm trying to tackle migration issue. We can't load pkled models from version 0.9.

sh1ng commented 4 years ago

superseded by https://github.com/h2oai/h2o4gpu/pull/822