Closed erg closed 11 years ago
This is due to my PR. In PR #1165, there is not test for discretization
function and I add one when finishing spectral embedding. Maybe @briancheung could have a look at the test code? I just generate an indicator matrix and add some gaussian noise then use discretization
to get the true label.
The test code looks reasonable, I'm not sure if it exactly simulates how the embedding space would look but I think discretization is just a specific case of the Procrustes problem so it should work.I just ran this code in the test manually and can't reproduce the failed assertion.
The answer is only off by .01, so it's probably numerical issues with a certain seed? I'm trying it in a loop, but I'm not sure if my machine will reproduce it or even if I have the right code since it was Travis-CI testing things.
for i in {1..1000}; do nosetests sklearn/cluster/tests/test_spectral.py:test_discretize; done
I just saw the error on my box, too. Seems random and pretty rare.
We could change the threshold to .85 or fix the seed or whatever.
@briancheung any opinion?
@erg I'm just merging #1436. Can you try again afterwards?
I just did a PR #1436 to make this reprodicible with test test_discretize(8).
I think lowering the threshold is one option, lowering the amount of noise added to y_true_noisy is another. Keep in mind, although this discretize is more stable than k-means it still can have the occasional bad initialization.
Well, it still doesn't fail for me with test_discretize(8)
. I never reproduced this, just saw it on Travis-CI. I'm on a Mac, so maybe it's more stable there and only fails on Linux.
If you get it to work on your box then push something. I'm out of this one :)
@amueller @erg I wanna make sure I understand the nosetest thing right.
for i in {1..1000}; do nosetests sklearn/cluster/tests/test_spectral.py:test_discretize; done
Will this use the same seed for all the 1000 tests? If not, how can I use different seeds with different round of tests or is there some options to explicitly control seed nosetests use?
Usually we try to fix all the seeds. When this isn't the case, the test above uses different seeds. Now the seed is fixed. Try the seed 8 maybe. I don't know how to set the seed from the command line.
On my Linux box I found a new error:
======================================================================
ERROR: sklearn.cluster.tests.test_spectral.test_discretize
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/nose-1.2.1-py2.7.egg/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/erg/python/scikit-learn/sklearn/cluster/tests/test_spectral.py", line 188, in test_discretize
+ 0.1 * random_state.randn(n_sample, n_class + 1))
ValueError: operands could not be broadcast together with shapes (50,9) (50,10)
Try seeds 33232 and 18069 for the shape bug, and 37287 for the precision one.
@erg Thank you so much for the efforts to detect those errors. The shape error is from that there maybe missing classes and thus the LabelBinarizer
will have less than 10 columns. I have fixed it in PR #1441
Fixed in #1441. Thanks @kuantkid !
Travis-CI found this unrelated issue when testing my pull request.
https://travis-ci.org/scikit-learn/scikit-learn/builds/3454819