scikit-learn / scikit-learn

scikit-learn: machine learning in Python
https://scikit-learn.org
BSD 3-Clause "New" or "Revised" License
60.13k stars 25.41k forks source link

HMM objects do not follow the scikit-learn estimator API #832

Closed GaelVaroquaux closed 12 years ago

GaelVaroquaux commented 12 years ago

Amongst other things, they have logic in their init.

@jaquesgrobler : do you think that you can put this on your todo list?

jaquesgrobler commented 12 years ago

Hey Gael,

I've already tweaked them a bit on my laptop - haven't PR'd it though..

I'm busy doing the same with it as I did with the GMM's API... You want this done by tomorrow I assume? Just want to get an idea of the time constraints because I'm currently nursing a bit of a migrane..

Shouldn't be a problem though and I'm keen to fix it.. Let me know :)

GaelVaroquaux commented 12 years ago

On Sun, May 06, 2012 at 12:06:02PM -0700, Jaques Grobler wrote:

I'm busy doing the same with it as I did with the GMM's API... You want this done by tomorrow I assume?

No! I don't like to ask someone to do something with a delay of one day. If you feel that you can improve the situation with a small pull request soon, than go for it, but don't kill yourself on it.

G

jaquesgrobler commented 12 years ago

Haha, okay .. I'm gonna have a look now what needs to be done there.. Is there a specific time tomorrow that the release happens? J

jaquesgrobler commented 12 years ago

Hey @GaelVaroquaux I'm busy with HMM's.. You want all the decision logic gone from the init? basically just initializations?

GaelVaroquaux commented 12 years ago

Hey @GaelVaroquaux I'm busy with HMM's.. You want all the decision logic gone from the init? basically just initializations?

Yes, all that logic should be in fit.

jaquesgrobler commented 12 years ago

Hey @GaelVaroquaux, there's a bit of problem - If I take the initializations out of __init__ and make them take place in fit(..), the tests start crying whentest_eval() or test_do_forward_pass (or any similiar test function) get's called, due to the default values of a few of these parameters being None. These functions want to, for example, use things like len(..) on them, which returns an error for NoneType. It makes sense since fit(..), which now contains the old decision logic from __init__ hasn't been called yet.

Current state of __init__ in hmm.py has the following logic:

        if startprob is None:
            startprob = np.tile(1.0 / n_components, n_components)
        self.startprob_ = startprob

        if startprob_prior is None:
            startprob_prior = 1.0
        self.startprob_prior = startprob_prior

        if transmat is None:
            transmat = np.tile(1.0 / n_components,
                    (n_components, n_components))
        self.transmat_ = transmat

        if transmat_prior is None:
            transmat_prior = 1.0
        self.transmat_prior = transmat_prior

        if algorithm in decoder_algorithms:
            self._algorithm = algorithm
        else:
            self._algorithm = "viterbi"
        self.random_state = random_state

So these chaps are all non-zero's by the time those test functions are called.. By removing this logic, the above shown logic only happens once fit(..) is called.

The signature looks like this (with my attempted changes), as well as the few first inits:

def __init__(self, n_components=1, startprob=None, transmat=None,
            startprob_prior=None, transmat_prior=None,
            algorithm="viterbi", random_state=None, n_iter=10,
            thresh=1e-2, params=string.ascii_letters,
            init_params=string.ascii_letters):
        self.n_components = n_components
        self.startprob_ = startprob
        self.startprob_prior = startprob_prior
        self.transmat_ = transmat
        ..
        ..

def fit(self, obs, **kwargs):
        ..
        ..

Here's one of the failed tests outputs (They all look like this <48 of 'em>):

======================================================================
ERROR: test_init (sklearn.tests.test_hmm.TestBaseHMM)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jaques/scikit-learn/sklearn/tests/test_hmm.py", line 49, in test_init
    h, framelogprob = self.setup_example_hmm()
  File "/home/jaques/scikit-learn/sklearn/tests/test_hmm.py", line 36, in setup_example_hmm
    h = self.StubHMM(2)
  File "/home/jaques/scikit-learn/sklearn/hmm.py", line 137, in __init__
    self.startprob_ = startprob
  File "/home/jaques/scikit-learn/sklearn/hmm.py", line 480, in _set_startprob
    if len(startprob) != self.n_components:
TypeError: object of type 'NoneType' has no len()

I could still do the API changes and leave the initializations as is, meaning the above shown decision logic stays within __init__.. Else I could change their default values to whatever the initializations change them to anyway.. which would look messy for one.

Any suggestions here? Thanks

jaquesgrobler commented 12 years ago

Okay if I close this guy?

amueller commented 12 years ago

ok.