quantling / pyndl

pyndl implements a Naive discriminative learning which is a learning and classification models based on the Rescorla-Wagner equations in python3.
https://pyndl.readthedocs.io
MIT License
12 stars 6 forks source link

sklearn-like API #130

Open dekuenstle opened 6 years ago

dekuenstle commented 6 years ago

The API provided by all machine learning models in the scikit-learn library became defacto industry standard for machine learning models in python. We should also provide a similar api for ndl or even make it the standard, because:

Pseudocode

from pyndl import Ndl, Vectorizer
# ...
cue_vec, outcome_vec = Vectorizer(), Vectorizer()
X = cue_vec.fit_transform(cues)
y = outcome_vec.fit_transform(cues)

model = Ndl(alpha=0.01)
model.fit(X, y)
# ...
X_test = cue_vec.transform(["#_this_is_a_test_#"])
y_pred = model.predict(X_test)
outcome_pred = outcome_vec.inverse_transform(y_pred)
# ...

This API cleanup would definitly be some work, but I think it is worth it. Most of them would just wrap around the existing one.

What do you think about this?

Trybnetic commented 6 years ago

I think it is a good idea!

But would you suggest writing only a wrapper or to restructure the whole project?

I would suggest adding an api.pymodule in which we define the classes and function you mentioned and import them into pyndl/__init__.py to be callable as you described.

What do you think?

derNarr commented 6 years ago

For me in a way the sklearn API never felt very pythonic. Creating objects and then calling methods on them to manipulate them inplace feels more like Ruby, otherwise it is the defacto standard in the python learning community and we should fit into them.

I would like to do a little bit more with sklearn before I decide, if we want to have a wrapper to the sklearn api while maintaining a separate api for us or if we only want to support the sklearn api and move to it completely.

@Trybnetic we should have an offline discussion about that.

The biggest conceptional change seems to be to not store all the information in the weights matrix with its attributes, but to have a model object that stores this information. For me it would be crucial how to serialize this model object and how to load it into R.

At the same time a python object as the model gives us more freedoms to have something like ndl_plus modeled and serialized the same way as ndl, which would be an advantage.

Questions that need to be answered:

dekuenstle commented 6 years ago

Let me answer the questions:

How does sklearn serializes learned models?

How is meta data like cpu usage, host name, computing date stored in sklearn?

keras (high level api for tensorflow / theano tensor processing frameworks) has an more interesting approach: