shogun-toolbox / shogun

Shōgun
http://shogun-toolbox.org
BSD 3-Clause "New" or "Revised" License
3.03k stars 1.04k forks source link

replace awkward state changes with free functions #4999

Open gf712 opened 4 years ago

gf712 commented 4 years ago

As we find more and more edge cases of the parameter framework the API has become increasingly awkward. For example, DeepAutoencoder::convert_to_neural_network(std::shared_ptr<NeuralLayer> output_layer, float64_t sigma) can be replaced with a class function without parameters by adding the output_layer and sigma to the state, but then the user has to alter the state of the object for all kind of specific functions, and it becomes hard to keep track of. Another case is to calculate the out of bag error:

rf.put("oob_evaluation_metric", sg.evaluation("MulticlassAccuracy"))
oob = rf.get("oob_error")   

Why should the machine have to keep the evaluation metric it in its state?

With that in mind why don't we replace some of these class functions with free functions, that try to down_cast the base class and dispatch the call?

std::shared<NeuralNetwork> convert_to_neural_network(std::shared_ptr<Machine> net, std::shared_ptr<NeuralLayer> output_layer, float64_t sigma)
{
  if (auto ae = net->as<DeepAutoencoder>())
    return ae->convert_to_neural_network(output_layer, sigma);
  error("Expected net to be a DeepAutoencoder");
}

or for multiple types (from most specialised, could cause bugs if not careful):

std::shared_ptr<DenseFeatures<float64_t>> reconstruct(const std::shared_ptr<Machine>& net, 
        const std::shared_ptr<DenseFeatures<float64_t>>& data)
{
    if (auto ae = net->as<DeepAutoencoder>())
        return ae->reconstruct(data);
    if (auto ae = net->as<Autoencoder>())
        return ae->reconstruct(data);
    error("Expected net to be a DeepAutoencoder or Autoencoder.");
}

These things will undo some of the work towards reducing the SWIG wrapper LoC but at least the API becomes cleaner (imo), and the boiler plate code they add is minimal compared to a class.

karlnapf commented 4 years ago

I agree it is one way to solve the problem without having to twist the put/get/run api too much. For me this would be a temporary solution, where the proper solution would be to refactor/redesign the classes such that these stunts are not necessary. On the other hand, I am pretty sure that there will always be corner cases. So +1 overall from my side.

eyeflap commented 4 years ago

Hi! I'd like to help with this but i'm not sure how to get started as this would be my first contribution on a git open source project.

gf712 commented 4 years ago

Hi @eyeflap I would recommend starting with another issue labelled as "good first issue", this is more on a case by case basis and would require a bit of familiarity with the framework. A really good starting issue is converting old python examples to meta examples #3555