sgoldenlab / simba

SimBA (Simple Behavioral Analysis), a pipeline and GUI for developing supervised behavioral classifiers
https://simba-uw-tf-dev.readthedocs.io/
GNU General Public License v3.0
289 stars 141 forks source link

XGBoost available as a model but not fully implemented #211

Open suntzuisafterU opened 2 years ago

suntzuisafterU commented 2 years ago

Hi,

I noticed that XGBoost is available as a model but has not been fully implemented. Is there development progress that has been made that we could sync with? I'm willing to implement this feature to finish our project if necessary, a bit of assistance with testing and verifying all code paths have been covered would be appreciated in this case.

Alternatively, if we load a pickled model that adheres to the sklearn interface, will that work?

PS: I tried to login to gitter to ask questions there but was not able to. It says: We're very sorry, but we're unable to log you in right now. (Forbidden)

sronilsson commented 2 years ago

Hi @suntzuisafterU! You’re right, we did originally plan for this… but there has not been any work on this for quite some time. The code for creating single classifiers is here, there is only a if self.algo == "RF" at the moment. For grid searching models it’s this method which just implicitly assumes that the input is RF. Both these would have to be updated to accept other algorithms. We’d have to update the GUI user-menus in SimBA to become dynamic where the hyperparameter entry boxes change depending on bagging vs boosting etc, as well has the structures holding the different hyperparameters when users train multiple models in a grid search. That kind of held me back from writing the code, that plus that I don’t expect to see any drastic model performance improvement in our use cases when shifting from RF to xgboost..

If you have a pickle and want to run it through SimBA, that might work as long as there is a SimBAproject_config.ini that hold the path of classifier, as well as some parameters (threshold, minimum bout length). This is the class SimBA uses for inference. You see SimBA just picks the paths that are defined in the project_config.ini. predict_proba is also the name of the method in xgboost I believe, so might work out of the box but I haven’t tried it.

suntzuisafterU commented 2 years ago

Thanks for the info! I'll update when we have finished the analysis, and open a PR if any useful work is done.