Closed tfwittwer closed 1 year ago
@tfwittwer can you provide a minimum reproducible example (MRE) that demonstrates the problem?
It seems to be data dependent, as some datasets work and others don't. I've shelved this issue for now by sticking with double data.
If you have time and are able to find a dataset that reproducibly segfaults, I can try to work with that and see if I can uncover the issue. I'm only able to use Linux, but we can see if the issue manifests there too.
Issue description
I'm training a Random Forest classifier. When using a double matrix as input data, training works fine. When using a float matrix (fmat), training crashes due to out_of_range issues:
Exception thrown at 0x00007FFF28C4051C in HaiClass_train.exe: Microsoft C++ exception: std::out_of_range at memory location 0x000000CD4CB78AF0. Unhandled exception at 0x00007FFF28C4051C in HaiClass_train.exe: Microsoft C++ exception: std::out_of_range at memory location 0x000000CD4CB78AF0.
Your environment
Steps to reproduce
Train a Random Forest classifier with fmat input data of sufficient size. The sample program appears to work fine with fmat (except that the cross validation does not appear to be implemented for fmat, I get an error during compilation), but that's a tiny data set. My dataset has 126 features and 5 million data points, but I've experienced the same issue when using only 100.000 points.
Expected behavior
Normal operation.
Actual behavior
Program crashes with out_of_range error.
My guess is that somewhere the use of double as data type is hardcoded, causing the program to step out of bounds of the input data. If Random Forest classification is supposed to work only with double data, then this should be documented and raise errors at compile time.