stoltzmaniac / Basic-ML-OOP

0 stars 3 forks source link

Inheritance is a pain #5

Open stoltzmaniac opened 4 years ago

stoltzmaniac commented 4 years ago

https://github.com/stoltzmaniac/Basic-ML-OOP/tree/master/01-Regression/Linear-Regression/Part-02

I have the models folder written up where data_handlers.py handles a bunch of the different inputs and have broken everything into classes. Then I have a separate dimensionalizers.py which does PCA. I also created regression.py to handle regression.

What I'm having trouble with is inheritance (OOP is not my strongest card). How can I improve this and utilize super() more efficiently? Currently, I cannot assign default values, I want the user to be able to input this at any point but have defaults set... I would also not like to force every single parameter to be input in the same order. I know this could be accomplished through *args and/or **kwargs but could really use a hand.

imbilltucker commented 4 years ago

Have you tried using default arguments in the init? That seems the easiest

One option is to make the defaults class variables and then overwrite them in the init, or some other function.

Another option is to make a config class, and pass that in to configure something. Then there's a structure.

There's also multiple inheritance you can use... (Bwahahahaah!) This can get confusing because not many people do this a lot, but these can be interesting options, especially if you want to consume several different sets of defaults.

The dictionary update function can also be useful. Have the defaults set in a dictionary, have the user over-ride them with another passed in dictionary via params.update(defaults); params.update(passed_in_dict)

Can you write down some examples of three different config cases you imagine for a particular function? There's lots of ways to skin the cat here.

stoltzmaniac commented 4 years ago

That makes a lot of sense, I think I'll avoid the multiple inheritance for a few more seasons :)

Greg just sent over a PR to help me out with some of these issues. I really like the idea of having a dictionary though, seems to make a lot of sense. I'll create the examples soon.

stoltzmaniac commented 4 years ago

It would be awesome to be able to write something like (and not specify a seed but have it default to, let's say 555)

notice that seed=123 is missing from below (so it throws an error)

pre_process_data = PreProcessData(
    predictor_vars=np.array([[0., 1., 2.], [4., 5., 6.]]),
    response_var=np.array([1., 2.]),
    scale_type="normalize",
    train_split=0.8
)
stoltzmaniac commented 4 years ago

Another case would be when using PCA. Since you almost always normalize variables, it would be great to have it be default only in that case...

Notice that scale_type='normalize' is missing, so it throws an error

from models.dimensionizers import PrincipalComponentAnalysis
from sklearn.datasets import load_iris
iris = load_iris()

data_x = iris['data']
data_y = iris['target']

pca = PrincipalComponentAnalysis(
    predictor_vars=data_x,
    response_var=data_y,
    train_split=0.7,
    seed=123,
    variance_explained_cutoff=0.95
)
stoltzmaniac commented 4 years ago

The correct results for those are shown in the readme.

stoltzmaniac commented 4 years ago

This keeps going on and on, learning_rate should have a default within LinearRegression and so forth. One thing I just discussed with Greg was that the super()__init__ method issues... they are ordered only, not keyword arguments, so I have to keep a consistent order every single time I inherit within the next class. That isn't a great experience, especially to debug