scikit-learn / scikit-learn

scikit-learn: machine learning in Python
https://scikit-learn.org
BSD 3-Clause "New" or "Revised" License
60.02k stars 25.39k forks source link

Remove mutable default arguments #4373

Closed amueller closed 9 years ago

amueller commented 9 years ago

We should remove mutable default arguments as much as sensibly possible. You can find some using the landscape.io bot: https://landscape.io/github/scikit-learn/scikit-learn/51/messages/error?page=3#errors Or even better with a test now: in #4379

The default value should be replace by None and then replaced by the actual value in fit. This is a good fit for GSoC / first time contributors.

DonBeo commented 9 years ago

I think that in some cases it may be better to replace the mutable element with an immutable structure. For example in https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py line 653 . I would replace alphas=[0.1, 1.0, 10.0] with alphas =np.array([0.1, 1.0, 10.0]) . If this is fine I can push this patch. It passes all the tests

amueller commented 9 years ago

wait, the np.array is also mutable, right?

DonBeo commented 9 years ago

o yes you are right my mistake.

At this point I am quiet sure that there are various function that have a numpy array as default element. Are we going to change those as well?

Maybe we can replace the lists and array with a tuple instead that with None. What do you think?

amueller commented 9 years ago

What is your concern about None? I don't think there are many places where we have numpy array default arugments. I can't really think of any now.

amueller commented 9 years ago

I can't see anything wrong with tuples, but None is a pretty standard idiom.

DonBeo commented 9 years ago

ok I would prefer tuple because to me they make easier to understand what are the default values. On the other side None seems more similar to the sklearn syntax.

amueller commented 9 years ago

the default values should be documented in the docstring anyhow. But I somewhat agree that seeing the default value in the signature is also nice. So +1 on using tuples where possible. Btw, I just pushed a thing to better configure landscape.io, so the report should now mostly list things that are useful to fix.

bryandeng commented 9 years ago

Hi amueller:

As a student interested in contributing to scikit-learn during this year's GSoC, can I take up this issue?

amueller commented 9 years ago

Check out what has been done in #4376 and maybe check with @DonBeo to see what he might be working on. Otherwise go ahead.

DonBeo commented 9 years ago

The mutable arguments in the ridge.py file have been successfully replaced. We need only to fix some conflicts in the test_ridge.py file and then we should be able to do the merge.

There are some other mutable default arguments in sklearn/externals/joblib/format_stack.py but my understanding is that we are not going to change them right now. There may be others mutable default arguments that I did not spot so far.

amueller commented 9 years ago

the landscape.io bot should list them: https://landscape.io/github/scikit-learn/scikit-learn/76/messages/error

bryandeng commented 9 years ago

OK. I'll fix them in the next few days.

bryandeng commented 9 years ago

Of the total 6 errors related to mutable default arguments,

3 in examples/applications/plot_species_distribution_modeling.py sklearn/manifold/t_sne.py sklearn/metrics/pairwise.py

have been fixed by me in #4409.

and 1 in sklearn/linear_model/ridge.py has been fixed by @DonBeo in #4376. 2 in sklearn/feature_selection/rfe.py have been fixed by @xuewei4d in #4368.

I shall be happy to add some tests to validate them, and also update documentation.

mblondel commented 9 years ago

Superseded by #4713.