pseudotensor commented 4 years ago

No issues for lgbm, but xgboost has these files different between upstream/master and the h2oai branch:

git diff --name-only upstream/master

... (skip new Docker/Makefiles)

include/xgboost/c_api.h
include/xgboost/gbm.h
include/xgboost/learner.h
include/xgboost/model_visitor.h
python-package/setup.py
python-package/xgboost/callback.py
python-package/xgboost/core.py
python-package/xgboost/sklearn.py
python-package/xgboost/training.py
requirements_build.txt
requirements_runtime.txt
src/CMakeLists.txt
src/analysis/xgbfi.cc
src/analysis/xgbfi.h
src/c_api/c_api.cc
src/common/common.cu
src/common/common.h
src/gbm/gblinear.cc
src/gbm/gblinear_model.h
src/gbm/gbtree.h
src/gbm/gbtree_model.h
src/learner.cc
src/objective/regression_obj.cu
src/tree/tree_updater.cc
src/tree/updater_gpu_common.cuh
src/tree/updater_gpu_hist.cu
tests/ci_build/tidy.py
tests/cpp/objective/test_regression_obj.cc
tests/cpp/tree/test_gpu_hist.cu
tests/python-gpu/test_gpu_updaters1gpu.py
tests/python/test_early_stopping.py
tests/python/test_plotting.py
tests/python/test_with_sklearn.py
tests/python/test_xgbfi.py

Need to check if differences all make sense.

pseudotensor commented 4 years ago

@sh1ng please check out:

git clone https://github.com/h2oai/xgboost
cd xgboost
git checkout h2oai
git diff upstream/master

Maybe we can avoid differences in common.cu, common.h, gblinear.cc, updater_gpu_common*, etc.

Unsure about:

@nkalonia1 This required still?

I left merge conflict unresolved as committed since I don't know what to do yet.

sh1ng commented 4 years ago

I'm out of context for https://github.com/h2oai/xgboost/commit/07744d1ddd3a2cdca89252e7a8743a4aa6fb9008

but using xgboost's version reasonable for files

xgboost/src/common/common.cu xgboost/src/common/common.h xgboost/src/tree/updater_gpu_common.cuh seems like the changes were not merged
https://github.com/dmlc/xgboost/commit/310fe60b35a400164a0817442aac64506d83ee6c#diff-a6c65348790350dbbbc7882c0fc0fff8

+++ b/tests/ci_build/tidy.py
@@ -173,7 +173,7 @@ class ClangTidy(object):
             self.compile_commands = json.load(fd)
         tidy_file = os.path.join(self.root_path, '.clang-tidy')
         with open(tidy_file) as fd:
-            self.clang_tidy = yaml.safe_load(fd)
+            self.clang_tidy = yaml.load(fd)
             self.clang_tidy = str(self.clang_tidy)
         all_files = []
         for entry in self.compile_commands:

accept it also

sh1ng commented 4 years ago

Failing tests are related to changes in pickle format(in xgboost 1.0). We need to create and save model_saved-1.0.pkl

pseudotensor commented 4 years ago

I had already adjusted what to keep in common, but it may be ok to remove those lines. They were required in order to load an old pickle that had been saved on GPU but loaded on CPU-only system. Pickle load would fail early, and I don't know that they ever fixed it. Too risky to remove IMO. Will recover if manual testing shows issues.

pseudotensor commented 4 years ago

Need to add migration safety to xgboost, since they changed the object saved and check a header. Old pickles aren't the same as new, but may be possible to still load old ones if knew what the header should be. Also possible things are totally different, and cannot load.

thirdwing commented 4 years ago

Please make sure we upgrade to xgboost 1.0.0.

We need the json support.

sh1ng commented 4 years ago

I've created PoC of h2o4gpu with 2 versions of xgboost(to load pickle into an old one save_model and load_model in a new one).

If two versions stays in separated processes it works(besides validation issue). Usage scenario can be: master process sees pickle from 0.9. Spawn a process with xgboost 0.9 that loads it and save the model. The parent process picks-up and loads created file.

If I load two versions of xgboost withing the same process it Segmentation faults on second call(load_model). With bt

#0  0x00007fffee7b1c40 in std::string::append(std::string const&) ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007fffb58dccba in dmlc::io::LocalFileSystem::Open(dmlc::io::URI const&, char const*, bool) ()
from /home/sh1ng/dev/.venv/lib/python3.6/site-packages/xgboost_prev/./lib/libxgboost.so
#2  0x00007fff724ac6e1 in dmlc::Stream::Create (uri=<optimized out>, 
flag=0x7fff72592c0a "r", try_create=<optimized out>)
at /home/sh1ng/dev/h2o4gpu/xgboost/dmlc-core/src/io.cc:137
#3  0x00007fff72181559 in XGBoosterLoadModel (handle=0x201cb60, 
fname=0x7fffac7847e8 "/home/sh1ng/dev/h2o4gpu/temp.model")
at /home/sh1ng/dev/h2o4gpu/xgboost/src/c_api/c_api.cc:589
#4  0x00007fffe1d76dae in ffi_call_unix64 ()
from /usr/lib/x86_64-linux-gnu/libffi.so.6
#5  0x00007fffe1d7671f in ffi_call ()
from /usr/lib/x86_64-linux-gnu/libffi.so.6
#6  0x00007fffe1f8a5c4 in _call_function_pointer (argcount=2, 
resmem=0x7fffffffc8e0, restype=<optimized out>, atypes=0x7fffffffc8a0, 
avalues=0x7fffffffc8c0, 
pProc=0x7fff72180920 <XGBoosterLoadModel(BoosterHandle, char const*)>, 
flags=4353) at ./Modules/_ctypes/callproc.c:831

2 - xgboost 1.0 but #1 - xgboost 0.9. and it's inside dmlc-code.

The failing peace of code https://github.com/dmlc/dmlc-core/blob/552f7de748fbff34f2708b03f930a47ded45d78e/src/io.cc#L137 uses singleton. Downgrading dmlc version(to have them match) fixes seg faults.

I can also ask possible solutions in the community, but before doing that I just want to know your opinions. Or should we go with first option?

@pseudotensor @thirdwing

thirdwing commented 4 years ago

I will suggest using 1.0.0, which has the official json support.

sh1ng commented 4 years ago

@thirdwing we will use 1.0 there's no doubts. I'm trying to tackle migration issue. We can't load pkled models from version 0.9.

sh1ng commented 4 years ago

superseded by https://github.com/h2oai/h2o4gpu/pull/822

h2oai / h2o4gpu

Update xgboost (had to fixup conflicts) and lightgbm (no conflicts) #821

2 - xgboost 1.0 but #1 - xgboost 0.9. and it's inside dmlc-code.