suiji / Arborist

Scalable decision tree training and inference.
Other
82 stars 14 forks source link

please make `callback.h` and `callback.cpp` independent #9

Closed fyears closed 8 years ago

fyears commented 8 years ago

Now I'm doing #6 (python bridge), but surprisingly find out that callback.h and callback.cpp in Core doesn't exist and rely on the implementation in R.

I really don't think it's a good idea. I thought that the Core is self-contained and only relies on standard libraries in cpp. Please remove the dependence if possible... :-( (Or, how to workaround it in Python Bridge?)

(Maybe <algorithm> is useful here?)

suiji commented 8 years ago

The Arborist relies on several statistical techniques that are typically better supported by their respective front ends. This is the origin of the call-back stubs you have encountered. While the project goals are on-record (see the R/Finance and PyData talks, for example) as favoring Core implementations over call-backs, it is sometimes necessary to resort to the latter, particularly in the case of sampling.

There are good reasons why a practitioner may want to train with a particular random-variate generator or sampling scheme, and exposing the mapping via call-backs is a practical way to ensure that it is possible to do so. Weighted sampling, in particular, can be tricky to get right and we would much rather give the user the ability to benefit from implementations perfected by experts. This is one reason people use Numpy, R, etc. - i.e., the ability to leverage the numerical power they offer.

I am inclined to agree with you in the case of sorting, for example,
and may ultimately recast the sorting utilities within the Core. In the case of sampling and random-variate generation, however, the mere fact that 'stdlib' offers an implementation is not necessarily a reason to use it.

On 05/07/2016 02:58 AM, fyears wrote:

Now I'm doing #6 https://github.com/suiji/Arborist/issues/6 (python bridge), but surprisingly find out that |callback.h| and |callback.cpp| in |Core| doesn't exist and rely on the implementation in |R|.

I really don't think it's a good idea. I thought that the |Core| is self-contained and only relies on standard libraries in cpp. Please remove the dependence if possible... :-( (Or, how to workaround it in Python Bridge?)

(Maybe || is useful here?)

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/suiji/Arborist/issues/9