Closed rosecers closed 1 year ago
@agoscinski I am honestly perplexed by some of the testing errors coming up right now (matmul failing for correctly-shaped matrices). Would appreciate thoughts -- @ceriottm @PicoCentauri @Luthaf and others also welcome to weigh in
Running tests/test_metrics.py
, I get things like:
ERROR: test_local_reconstruction_error_train_idx (__main__.ReconstructionMeasuresTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/rca/source_installs/scikit-matter/tests/test_metrics.py", line 141, in test_local_reconstruction_error_train_idx
lfre_val = pointwise_local_reconstruction_error(
File "/Users/rca/miniconda3/lib/python3.10/site-packages/skmatter/metrics/_reconstruction_measures.py", line 456, in pointwise_local_reconstruction_error
- 2 * X_test @ X_train.T
File "/Users/rca/miniconda3/lib/python3.10/site-packages/numpy/ma/core.py", line 3077, in __array_wrap__
m = reduce(mask_or, [getmaskarray(arg) for arg in input_args])
File "/Users/rca/miniconda3/lib/python3.10/site-packages/numpy/ma/core.py", line 1757, in mask_or
return make_mask(umath.logical_or(m1, m2), copy=copy, shrink=shrink)
ValueError: operands could not be broadcast together with shapes (15,4) (4,5)
I've checked, it's within the @
operator, not the *
. Also checked that both matrices are the same type.
For the record, this only happens for tests/test_metrics.py
and tests/test_sparse_kernel_centerer.py
on my end.
In the StandartFlexibleScaler np.ma.*
is used which transforms the mean to a <class 'numpy.ma.core.MaskedArray'>
and thus also the transformed array.c
https://github.com/lab-cosmo/scikit-matter/blob/766e3dabc42f26727b484b37fcd696ad64a9c222/src/skmatter/preprocessing/_data.py#L152
Then in the metrics code the @ operation between masks is executed as an elementwise multiplication which results in the error because of not broadcastability of the shapes
Just replace np.ma.*
with regular np.*
Unfortunately, np.average
flags an error with scikit learn (scikit learn
does not like the name average
, even if we rename upon import). Unless
there's another function we can use, we might need to make a spoof
On Thu, May 18, 2023, 07:09 Alexander Goscinski @.***> wrote:
In the StandartFlexibleScaler np.ma.* is used which transforms the mean to a <class 'numpy.ma.core.MaskedArray'> and thus also the transformed array.c
https://github.com/lab-cosmo/scikit-matter/blob/766e3dabc42f26727b484b37fcd696ad64a9c222/src/skmatter/preprocessing/_data.py#L152 Then in the metrics code the @ operation between masks is executed as an elementwise multiplication which results in the error because of not broadcastability of the shapes
Just replace np.ma. with regular np.
— Reply to this email directly, view it on GitHub https://github.com/lab-cosmo/scikit-matter/pull/196#issuecomment-1552961173, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALKVP3TLPJJBZPYP2QLHIZDXGYGPBANCNFSM6AAAAAAYFTRLLE . You are receiving this because you authored the thread.Message ID: @.***>
I dont fully understand the problem with np.average, could you elaborate. Where does scikit-learn return an error?
Everything passes! Ready for a proper review with one or two outstanding q's:
BaseEstimator
, but do not follow the estimator behavior to a tee (their transformed objects do not have the same number of rows as their inputs). In order to delay the selector-refactor conversation, I suggest we leave them out of test_check_estimators
I should note what many of the necessary fixes were:
variable_
type attributes outside of the initializerReshape your data
)n_features_in_
and n_samples_in
during fit functionsI have a question here, shouldn't we explicitly add tests with @parametrize_with_checks
in our test sets? Or the requirement that it should just be possible to run such tests?
technically, the sample selectors inherit from BaseEstimator, but do not follow the estimator behavior to a tee (their transformed objects do not have the same number of rows as their inputs). In order to delay the selector-refactor conversation, I suggest we leave them out of test_check_estimators
If they are okay with it, sure. We can write in the doc also that it is not compatible with Pipeline yet, if this helps convincing them. In the end we need to mark this somehow programmatically. I am not sure how to do this, since scikit-learn Pipeline never checks for the type of the transformers, so changing the base class does not help here. Maybe the contributors have some useful suggestions.
Should orthogonalregression inherit from base estimator? @agoscinski
Yes it is an estimator so it should inherit from it, what is the problem with it? That it does not always agrees with the shape of the input, right? We can make mark it as private class (_OrthogonalRegression), because it is only use for reconstruction measures, then I think we can ommit it for the estimator check. But it basically has the same problems as the sample selection classes, so I would do the apply the same solution.
I have a question here, shouldn't we explicitly add tests with
@parametrize_with_checks
in our test sets? Or the requirement that it should just be possible to run such tests?
Either way is fine, but I think having everything in a dedicated script makes the paper trail easier to follow
@agoscinski and @hurricane642 can you do another formal review?
scikit-learn-contrib (our PR is https://github.com/scikit-learn-contrib/scikit-learn-contrib/issues/62) requires that all estimators pass a check_estimators test ala https://scikit-learn.org/stable/modules/generated/sklearn.utils.estimator_checks.parametrize_with_checks.html#sklearn.utils.estimator_checks.parametrize_with_checks, which ours were not currently doing. I've been going through everything that inherits from the BaseEstimator class (or should) and making the required changes. @agoscinski would appreciate your input on the
linear_models
section, as I'm not sure that OrthogonalRegression can pass the estimators, as to my knowledge the predicted values may have a different shape that the fitted y, no? Please correct me if I'm wrong.:books: Documentation preview :books:: https://scikit-matter--196.org.readthedocs.build/en/196/