Open UnixJunkie opened 6 years ago
Currently, I am interested by classification and regression, not survival.
related to https://github.com/imbs-hl/ranger/issues/305 for a simple usage example with related example input file
If you plan to support a sparse file format, I recommend the CSR file format. For example: https://github.com/UnixJunkie/orrandomForest/blob/master/data/Boston_test_features.csr each entry in a line is a column index ':' the value for that feature index for the current line. All other entries are assumed to be 0.
We have support for sparse data - but only in the R version. It's very easy to use, see https://github.com/imbs-hl/ranger/issues/135#issuecomment-293284786 for an example.
It's probably not that hard to include sparse data in the pure C++ version. We already have a DataSparse
class using Eigen, see
https://github.com/imbs-hl/ranger/blob/master/src/DataSparse.h and
https://github.com/imbs-hl/ranger/blob/master/src/DataSparse.cpp. We just have to fill that with some data. Unfortunately I don't have the time for this at the moment. Feel free to create a pull request. ;)
Btw., these to files are under GPL license because they are currently used only in the R version. If required I can change them to MIT, I don't see any GPL dependencies there.
Regarding the example file, you already found #305. I have renamed that issue.
Can you point me to the code that does the data file reading for the C++ version? I guess that's where I should make changes to support a new format. I might have a look at it, but I'm doubtful I can contribute such a big feature. My C++ is all rotten also.
Hello,
Very nice software. I will give it a try and may write a thin OCaml wrapper if it works well (I will cite it also).
Hence, I have several questions:
So, this is more a request for some more documentation than a real issue/bug report. Hope you don't mind.
Best regards, Francois.