mlpack / mlpack

mlpack: a fast, header-only C++ machine learning library
https://www.mlpack.org/
Other
5.13k stars 1.61k forks source link

Random Forest training crashing with fmat data #3410

Closed tfwittwer closed 1 year ago

tfwittwer commented 1 year ago

Issue description

I'm training a Random Forest classifier. When using a double matrix as input data, training works fine. When using a float matrix (fmat), training crashes due to out_of_range issues:

Exception thrown at 0x00007FFF28C4051C in HaiClass_train.exe: Microsoft C++ exception: std::out_of_range at memory location 0x000000CD4CB78AF0. Unhandled exception at 0x00007FFF28C4051C in HaiClass_train.exe: Microsoft C++ exception: std::out_of_range at memory location 0x000000CD4CB78AF0.

Your environment

Steps to reproduce

Train a Random Forest classifier with fmat input data of sufficient size. The sample program appears to work fine with fmat (except that the cross validation does not appear to be implemented for fmat, I get an error during compilation), but that's a tiny data set. My dataset has 126 features and 5 million data points, but I've experienced the same issue when using only 100.000 points.

Expected behavior

Normal operation.

Actual behavior

Program crashes with out_of_range error.

My guess is that somewhere the use of double as data type is hardcoded, causing the program to step out of bounds of the input data. If Random Forest classification is supposed to work only with double data, then this should be documented and raise errors at compile time.

conradsnicta commented 1 year ago

@tfwittwer can you provide a minimum reproducible example (MRE) that demonstrates the problem?

https://en.wikipedia.org/wiki/Minimal_reproducible_example

tfwittwer commented 1 year ago

It seems to be data dependent, as some datasets work and others don't. I've shelved this issue for now by sticking with double data.

rcurtin commented 1 year ago

If you have time and are able to find a dataset that reproducibly segfaults, I can try to work with that and see if I can uncover the issue. I'm only able to use Linux, but we can see if the issue manifests there too.