shogun-toolbox / shogun

Shōgun
http://shogun-toolbox.org
BSD 3-Clause "New" or "Revised" License
3.03k stars 1.04k forks source link

RandomForest segfaults once deserialized. #5060

Open geektoni opened 4 years ago

geektoni commented 4 years ago

When using the python interface, if we serialize and then deserialize a RandomForest object, we will get a segfault if we try to call the apply_regression method from the deserialized object. See the code below for an example.

#!/usr/bin/env python
# coding: utf-8

import shogun as sg
import numpy as np

# Create random features
X_train = np.random.normal(0, 1, (100, 5))
betas = np.random.normal(0,1, 5)
y_train = np.dot(X_train, betas)

X_test = np.random.normal(0, 1, (10, 5))
y_test = np.dot(X_test, betas)

features_train = sg.create_features(X_train.T)
features_test = sg.create_features(X_test.T)
labels_train = sg.create_labels(y_train)
labels_test = sg.create_labels(y_test)

# Create the random forest object
mean_rule = sg.create_combination_rule("MeanRule")
rand_forest = sg.create_machine("RandomForest", labels=labels_train, num_bags=5,
                                seed=1, combination_rule=mean_rule)

rand_forest.train(features_train)
labels_predict = rand_forest.apply_regression(features_test)

# Serialize the model
model_file_path = './sample_model.json'
sg.serialize(model_file_path, rand_forest, sg.JsonSerializer())

# Deserialize the model and return
deserialized_rand_forest = sg.as_machine(sg.deserialize(model_file_path, sg.JsonDeserializer()))
labels_train_predict = deserialized_rand_forest.apply_regression(features_test)
gf712 commented 4 years ago

Hmmm, I guess this is why the random forest meta example sometimes fails locally... Have you tried to run it in gdb yet?

geektoni commented 4 years ago

Hmmm, I guess this is why the random forest meta example sometimes fails locally... Have you tried to run it in gdb yet?

Nope, I still need to check properly.

geektoni commented 4 years ago

The stacktrace is the following:

#0  0x00007ffff58156b4 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count (this=0x7fffce914c38, __r=...)
    at /usr/include/c++/7/bits/shared_ptr_base.h:849
#1  0x00007ffff57cb20d in std::__shared_ptr<shogun::SGObject, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<shogun::SGObject, void> (
    this=0x7fffce914c30, __r=...) at /usr/include/c++/7/bits/shared_ptr_base.h:1147
#2  0x00007ffff576818b in std::shared_ptr<shogun::SGObject>::shared_ptr<shogun::SGObject, void> (this=0x7fffce914c30, __r=...)
    at /usr/include/c++/7/bits/shared_ptr.h:266
#3  0x00007ffff570f315 in std::enable_shared_from_this<shogun::SGObject>::shared_from_this (this=0x8)
    at /usr/include/c++/7/bits/shared_ptr.h:640
#4  0x00007ffff579fe15 in shogun::SGObject::as<shogun::BinaryTreeMachineNode<shogun::CARTreeNodeData> > (this=0x0)
    at /home/gdetoni/Github/shogun/src/shogun/base/SGObject.h:641
#5  0x00007ffff1f2fc00 in shogun::CARTree::apply_regression (this=0x555556162e90, data=...)
    at /home/gdetoni/Github/shogun/src/shogun/multiclass/tree/CARTree.cpp:142
#6  0x00007ffff12bd75f in shogun::Machine::apply (this=0x555556162e90, data=...)
    at /home/gdetoni/Github/shogun/src/shogun/machine/Machine.cpp:128
#7  0x00007ffff1285fdf in shogun::BaggingMachine::apply_outputs_without_combination(std::shared_ptr<shogun::Features>) [clone ._omp_fn.0]
    () at /home/gdetoni/Github/shogun/src/shogun/machine/BaggingMachine.cpp:114
#8  0x00007fffeac166d5 in gomp_thread_start (xdata=<optimized out>)
    at /home/nwani/m3/conda-bld/compilers_linux-64_1560109574129/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libgomp/team.c:123
#9  0x00007ffff7bbd6db in start_thread (arg=0x7fffce915700) at pthread_create.c:463
#10 0x00007ffff78e688f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

It seems like we have some null pointer here...

gf712 commented 4 years ago

From what I can tell is that m_root contains a nullptr. It seems like that is the default value in the default constructor. And you can see that it says there that m_root has not been added to the parameter framework, so it is not being serialised. I am not sure why it is not being registered as an SGObject though? That should work fine.

geektoni commented 4 years ago

Thank you @gf712 for the insight!

geektoni commented 4 years ago

If we add m_root as a parameter we will get the following messages once we try to deserialize:

[06/09/20 10:45:49 error] Could not create 'BinaryTreeMachineNode' class
[06/09/20 10:45:49 warning] Error while deserializeing RandomCARTree: ShogunException: Could not create 'BinaryTreeMachineNode' class

The issue should be caused by this problem below here (taken from the source code).

// the problem is "CARTree"/"RandomCARTree" can't be cloned because
// they inherit from TreeMachine which is templated and can't be
// created in class_list
// SG_ADD((std::shared_ptr<SGObject>*)&m_root,"m_root", "tree structure");

I am not super familiar with the serialization framework, so are there any possible ways to make it work again?

gf712 commented 4 years ago

Hmm I see, it needs the template parameter. @vigsterkr I guess we need to replace TreeMachineNode with non templated version?

hasini93 commented 3 years ago

Got a segfault as well deserializing a random forest using c++ (develop version and release 6.1.4). Same issue was presented in previous issues, latter discussed a workaround:

https://github.com/shogun-toolbox/shogun/issues/3481 https://github.com/shogun-toolbox/shogun/issues/4242

Is there any update to this issue?