smartcorelib / smartcore

A comprehensive library for machine learning and numerical computing. The library provides a set of tools for linear algebra, numerical computing, optimization, and enables a generic, powerful yet still efficient approach to machine learning.
https://smartcorelib.org/
Apache License 2.0
698 stars 75 forks source link

refactor: do not require an empty struct in cross_validate to define … #231

Open morenol opened 1 year ago

morenol commented 1 year ago

…which is the estimator

Fixes https://github.com/smartcorelib/smartcore/issues/230

Checklist

Current behaviour

New expected behaviour

Change logs

morenol commented 1 year ago

Even though, it looks like more easy to use this passing the parameter, it looks weird to me to pass an unitialized empty struct, instead we can take advantage of that we can explicitly define which is the Estimator that we want to use using turbofish syntax

Mec-iS commented 1 year ago

@morenol from a programming perspective I agree, but writing ::<_, _, _, _, _, LogisticRegression<_, _, _, _>, _, _> is really not viable for an interface that has to face an end-user. Think if you had to write that to issue a request using Python requests: requests.request::<_,_,_,_, POST,_,_,_>(...) ?!!? would you ever use something like that?

If you want to remove the new there are these options imo:

I used new because there are some other constraints in place and the alternatives are not straightforward. As we have as a main point to keep an easy "pythonic" API we cannot allow Rust's types to reach the end-user's level.

PS: I don't like how cross_validate and cross_validate_predict are implemented, so maybe we should rewrite them in a better way. This could also lead to a better definition of the SupervisedEstimator and Predictor traits. For an example see fit and predict.

codecov-commenter commented 1 year ago

Codecov Report

Merging #231 (e95798d) into development (aab3817) will increase coverage by 0.11%. The diff coverage is n/a.

@@               Coverage Diff               @@
##           development     #231      +/-   ##
===============================================
+ Coverage        43.93%   44.05%   +0.11%     
===============================================
  Files               85       85              
  Lines             7290     7268      -22     
===============================================
- Hits              3203     3202       -1     
+ Misses            4087     4066      -21     
Impacted Files Coverage Δ
src/ensemble/random_forest_classifier.rs 53.54% <ø> (+0.34%) :arrow_up:
src/ensemble/random_forest_regressor.rs 57.37% <ø> (+0.46%) :arrow_up:
src/linear/elastic_net.rs 50.00% <ø> (+0.39%) :arrow_up:
src/linear/lasso.rs 50.00% <ø> (+0.45%) :arrow_up:
src/linear/linear_regression.rs 35.41% <ø> (+1.41%) :arrow_up:
src/linear/logistic_regression.rs 38.14% <ø> (+0.19%) :arrow_up:
src/linear/ridge_regression.rs 40.17% <ø> (+0.34%) :arrow_up:
src/model_selection/mod.rs 72.22% <ø> (ø)
src/naive_bayes/bernoulli.rs 41.56% <ø> (+0.84%) :arrow_up:
src/naive_bayes/categorical.rs 34.50% <ø> (-1.16%) :arrow_down:
... and 14 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

morenol commented 1 year ago

@morenol from a programming perspective I agree, but writing ::<_, _, _, _, _, LogisticRegression<_, _, _, _>, _, _> is really not viable for an interface that has to face an end-user. Think if you had to write that to issue a request using Python requests: requests.request::<_,_,_,_, POST,_,_,_>(...) ?!!? would you ever use something like that?

If you want to remove the new there are these options imo:

  • passing the single fit function using Fn or FnOnce but it is hard as every estimator has different signatures
  • passing the string name "LogisticRegression" and instantiate an object inside fit using an enumerator (as proposed for Kernel in Deserialization for Kernel #221) or an enumerator key like EstimatorEnum::LogisticRegression
  • refactoring cross_validate in a way that the user has to write only cross_validate::<LogisticRegression>(...)
  • anything else I don't know about

I used new because there are some other constraints in place and the alternatives are not straightforward. As we have as a main point to keep an easy "pythonic" API we cannot allow Rust's types to reach the end-user's level.

PS: I don't like how cross_validate and cross_validate_predict are implemented, so maybe we should rewrite them in a better way. This could also lead to a better definition of the SupervisedEstimator and Predictor traits. For an example see fit and predict.

I agree, I am not fan of having an API were something like this is required ::<_, _, _, _, _, LogisticRegression<_, _, _, _>, _, _>. I will keep this open until I find a better way

Sammyreal1 commented 1 year ago

Even though, it looks like more easy to use this passing the parameter, it looks weird to me to pass an unitialized empty struct, instead we can take advantage of that we can explicitly define which is the Estimator that we want to use using turbofish syntax

Mec-iS commented 1 year ago

Even though, it looks like more easy to use this passing the parameter, it looks weird to me to pass an unitialized empty struct, instead we can take advantage of that we can explicitly define which is the Estimator that we want to use using turbofish syntax

see discussion at https://github.com/smartcorelib/smartcore/issues/230