As suggested by @bellet in #339, it's better to cut SCML warnings from the root by specifying the n_basis value when its possible.
As all test run the iris dataset for SCML though build_triplets at test_utils.py, then we just need to scpecify n_basis for SCML and SCML_Supervised in the lists.
Going through the code of SCML, if n_basis is None, then the following default is assigned:
Iris dataset has 150 samples, 3 classes, 4 features. So the value is 80 for SCML_Supervised and 320 for SCML.
I changed it, but got the following tests failing.
In test_components_is_2D at test_mahalanobis_mixin.py there is an error because of a border case. I got this warning from SCML:
The number of bases with nonzero weight is less than the number of features of the input, in consequence the learned transformation reduces the dimension to 0
And this error from the test:
> assert model.components_.shape == (1, 1) # the components must be 2D
E assert (0, 1) == (1, 1)
E At index 0 diff: 0 != 1
E Use -v to get the full diff
So the test want to test fit for just one feature. Right now the feature selected from Iris is the 1st one, but if we use 2nd dimension instead, we can avoid this border case and make the test pass. The behavior expected is the same.
# trunc_data = input_data[..., :1] # Old
trunc_data = input_data[..., 1:2] # New
There is also an error in test_sklearn_compat.py, TestSklearnCompat class, test_scml, check_estimator function if n_basis is arbitrary, because under the hood it calls fit() wich calls _generate_bases_LDA, and if n_basis is wrong, it may fail. Thus, we need to let SCML calc n_basis for us as we don't know wich random X, y dataset is being used. Thus, we need to use pytest.warn to catch the warning.
Not an error, but an obervation in test_triplet_diffs and test_lda: These tests needs an explicitly n_basis=None to check if it is being set correctly internally, thus, in this case the warning needs to be caught as well.
Besides this observations, all warnings from SCML were removed. Only 300-ish warnings are shown now across all tests.
Also, this PR removes the isinstance() overkill made in test_triplets_clasifiers.py previously.
As suggested by @bellet in #339, it's better to cut SCML warnings from the root by specifying the
n_basis
value when its possible.As all test run the iris dataset for SCML though
build_triplets
attest_utils.py
, then we just need to scpecifyn_basis
forSCML
andSCML_Supervised
in the lists.Going through the code of
SCML
, ifn_basis
isNone
, then the following default is assigned:Default for SCML_Supervised: (lines 579-583)
Default for SCML: (line 234)
Iris dataset has 150 samples, 3 classes, 4 features. So the value is 80 for
SCML_Supervised
and 320 forSCML
.I changed it, but got the following tests failing.
In
test_components_is_2D
attest_mahalanobis_mixin.py
there is an error because of a border case. I got this warning from SCML:And this error from the test:
So the test want to test
fit
for just one feature. Right now the feature selected from Iris is the 1st one, but if we use 2nd dimension instead, we can avoid this border case and make the test pass. The behavior expected is the same.There is also an error in
test_sklearn_compat.py
,TestSklearnCompat
class,test_scml
,check_estimator
function ifn_basis
is arbitrary, because under the hood it callsfit()
wich calls_generate_bases_LDA
, and ifn_basis
is wrong, it may fail. Thus, we need to letSCML
calcn_basis
for us as we don't know wich randomX, y
dataset is being used. Thus, we need to usepytest.warn
to catch the warning.Not an error, but an obervation in
test_triplet_diffs
andtest_lda
: These tests needs an explicitlyn_basis=None
to check if it is being set correctly internally, thus, in this case the warning needs to be caught as well.Besides this observations, all warnings from SCML were removed. Only 300-ish warnings are shown now across all tests.
Also, this PR removes the
isinstance()
overkill made intest_triplets_clasifiers.py
previously.Best! 🎃