Closed pseudotensor closed 4 years ago
@sh1ng please check out:
git clone https://github.com/h2oai/xgboost
cd xgboost
git checkout h2oai
git diff upstream/master
Maybe we can avoid differences in common.cu, common.h, gblinear.cc, updater_gpu_common*, etc.
Unsure about:
@nkalonia1 This required still?
I left merge conflict unresolved as committed since I don't know what to do yet.
I'm out of context for https://github.com/h2oai/xgboost/commit/07744d1ddd3a2cdca89252e7a8743a4aa6fb9008
but using xgboost's version reasonable for files
xgboost/src/common/common.cu
xgboost/src/common/common.h
xgboost/src/tree/updater_gpu_common.cuh seems like the changes were not merged
https://github.com/dmlc/xgboost/commit/310fe60b35a400164a0817442aac64506d83ee6c#diff-a6c65348790350dbbbc7882c0fc0fff8
+++ b/tests/ci_build/tidy.py
@@ -173,7 +173,7 @@ class ClangTidy(object):
self.compile_commands = json.load(fd)
tidy_file = os.path.join(self.root_path, '.clang-tidy')
with open(tidy_file) as fd:
- self.clang_tidy = yaml.safe_load(fd)
+ self.clang_tidy = yaml.load(fd)
self.clang_tidy = str(self.clang_tidy)
all_files = []
for entry in self.compile_commands:
accept it also
Failing tests are related to changes in pickle format(in xgboost 1.0). We need to create and save model_saved-1.0.pkl
I had already adjusted what to keep in common, but it may be ok to remove those lines. They were required in order to load an old pickle that had been saved on GPU but loaded on CPU-only system. Pickle load would fail early, and I don't know that they ever fixed it. Too risky to remove IMO. Will recover if manual testing shows issues.
Need to add migration safety to xgboost, since they changed the object saved and check a header. Old pickles aren't the same as new, but may be possible to still load old ones if knew what the header should be. Also possible things are totally different, and cannot load.
Please make sure we upgrade to xgboost 1.0.0.
We need the json support.
I've created PoC of h2o4gpu with 2 versions of xgboost(to load pickle into an old one save_model and load_model in a new one).
#0 0x00007fffee7b1c40 in std::string::append(std::string const&) ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x00007fffb58dccba in dmlc::io::LocalFileSystem::Open(dmlc::io::URI const&, char const*, bool) ()
from /home/sh1ng/dev/.venv/lib/python3.6/site-packages/xgboost_prev/./lib/libxgboost.so
#2 0x00007fff724ac6e1 in dmlc::Stream::Create (uri=<optimized out>,
flag=0x7fff72592c0a "r", try_create=<optimized out>)
at /home/sh1ng/dev/h2o4gpu/xgboost/dmlc-core/src/io.cc:137
#3 0x00007fff72181559 in XGBoosterLoadModel (handle=0x201cb60,
fname=0x7fffac7847e8 "/home/sh1ng/dev/h2o4gpu/temp.model")
at /home/sh1ng/dev/h2o4gpu/xgboost/src/c_api/c_api.cc:589
#4 0x00007fffe1d76dae in ffi_call_unix64 ()
from /usr/lib/x86_64-linux-gnu/libffi.so.6
#5 0x00007fffe1d7671f in ffi_call ()
from /usr/lib/x86_64-linux-gnu/libffi.so.6
#6 0x00007fffe1f8a5c4 in _call_function_pointer (argcount=2,
resmem=0x7fffffffc8e0, restype=<optimized out>, atypes=0x7fffffffc8a0,
avalues=0x7fffffffc8c0,
pProc=0x7fff72180920 <XGBoosterLoadModel(BoosterHandle, char const*)>,
flags=4353) at ./Modules/_ctypes/callproc.c:831
The failing peace of code https://github.com/dmlc/dmlc-core/blob/552f7de748fbff34f2708b03f930a47ded45d78e/src/io.cc#L137 uses singleton. Downgrading dmlc version(to have them match) fixes seg faults.
I can also ask possible solutions in the community, but before doing that I just want to know your opinions. Or should we go with first option?
@pseudotensor @thirdwing
I will suggest using 1.0.0, which has the official json support.
@thirdwing we will use 1.0 there's no doubts. I'm trying to tackle migration issue. We can't load pkled models from version 0.9.
superseded by https://github.com/h2oai/h2o4gpu/pull/822
No issues for lgbm, but xgboost has these files different between upstream/master and the h2oai branch:
Need to check if differences all make sense.