lemma-osu / sknnr

scikit-learn compatible estimators for various kNN imputation methods
https://sknnr.readthedocs.io
0 stars 1 forks source link

Add R6 southwest Oregon (SWO) Ecology dataset #46

Closed grovduck closed 1 year ago

grovduck commented 1 year ago

This PR would add the R6 southwest Oregon (SWO) Ecology plot dataset to the sample data available in sknnr. The dataset consists of a species matrix (percent cover by tree species) and an environmental matrix (climate, topography, and spectral) for 3,005 plots in SWO measured in 2000. The R6 Ecology Program installed these plots and this PR is contingent upon their permission to use these data.

This PR adds a new function (load_swo_ecoplot) which makes these data available to all sknnr (and scikit-learn) estimators.

Pending approval, the data in swo_ecoplot_env.csv and swo_ecoplot_spp.csv are all dummy values, but do currently pass our test suite. EDIT: Actual data is now committed after approval from R6 Ecology Group and data citation is added.

grovduck commented 1 year ago

Just FYI, I ran these data through yaImpute and created a bunch of new train/test files. Then I clumsily copied test_port.py into a swo_ecoplot specific version and all tests passed. I didn't think it was worth adding all of these files just to verify the porting logic, but we may want to keep the yaImpute output around if we end up getting back to #42?

aazuspan commented 1 year ago

Definitely reassuring to hear that everything passes with a second dataset, but I agree that it's not needed in the test suite. For the connection with #42, are you thinking that we would verify accuracy against yaImpute one last time before we switch to regression testing and ditch the port tests? That sounds reasonable to me!

grovduck commented 1 year ago

For the connection with #42, are you thinking that we would verify accuracy against yaImpute one last time before we switch to regression testing and ditch the port tests? That sounds reasonable to me!

I'm not sure what I was thinking, but I did just spend a bit of time playing around with syrupy and I think I understand the main concepts there. I'll put some follow up thoughts in #42.

grovduck commented 1 year ago

@aazuspan. I think this is ready to go, but thought I'd see if you had any issues with the docstring for load_swo_ecoplot.

aazuspan commented 1 year ago

LGTM!