lemma-osu / sknnr

scikit-learn compatible estimators for various kNN imputation methods
https://sknnr.readthedocs.io
0 stars 1 forks source link

Test accuracy of `predict` methods #26

Closed aazuspan closed 1 year ago

aazuspan commented 1 year ago

Currently, we test the accuracy of fit and kneighbors against yaImpute in test_port.py, but we don't yet test the accuracy of predict, which makes it easy for small mistakes that affect results but not functionality (like #23) to go unnoticed.

As discussed in that issue, we should add some tests that compare prediction results of our estimators to those generated with yaImpute.

grovduck commented 1 year ago

@aazuspan, a quick note. I figured out my issue with the predictions and it was how I was calling yaImpute. I've got some tests for both equal-weights and distance-weighted predictions and they are all passing. Woot! test_port.py and conftest.py are a complete shambles right now (with lots of duplication), so I'm putting them into a gist just so you can see the work. It will probably inform how we tackle #2 and #3.

Note that when you call kneighbors without an argument, you get the k closest neighbors not including itself, but predict has to have an X argument which therefore uses itself as the first neighbor and factors into the calculation. yaImpute doesn't use itself as a neighbor and returns a prediction based on the k independent neighbors. So therefore, I'm not testing clf.predict(X_train) at present.

aazuspan commented 1 year ago

That's great news @grovduck!

Agreed that test_port.py could definitely use some clean up... If we could use parametrize like we do in test_estimators.py that would be great, but I'm not sure how to account for the fact that GNN and MSN are fit with the spp arg while the rest aren't. I suppose we may need two sets of parametrized tests based on whether or not they use ancillary fitting data. But also, given the connection you mention with other issues, it might not be worth spending too much effort cleaning these up yet.

Thanks for the code reference and the explanation of the issue!

grovduck commented 1 year ago

@aazuspan - is this worth pushing this more or less as is, knowing that we will be cleaning up the testing data and functions as part of #3 or should we focus on getting #3 in there first?

aazuspan commented 1 year ago

I'd say go ahead as is!

grovduck commented 1 year ago

Resolved by #32