holoviz-topics / EarthML

Tools for working with machine learning in earth science
https://earthml.holoviz.org
BSD 3-Clause "New" or "Revised" License
94 stars 21 forks source link

ENH: add prediction framework #13

Closed stsievert closed 5 years ago

stsievert commented 6 years ago

This PR aims to adds a reasonable first pass at predicting one fluxnet station from every other fluxnet station.

If linear models work well at one station, they should work well globally if the right points are fed in. My aim for this notebook is to attempt to provide a reasonable approximation at that.

My metric to judge models will be the correlation coefficient between the predicted variable and observed variable (which I'm using because someone from NASA on the last phone call mentioned this). This PR will not optimize for this metric, but it play some role.

TODO:

We also need to decide if we need an embedding framework, and what variables to feed in. I think that'll be a work of a separate PR.

stsievert commented 6 years ago

I think this PR is ready for merge. I have added

  1. time partitioning
    • This tries to predict each time_partition of the test dataset independently, and time_partition can be months, seasons or years. The notebook will use every other site to predict the unseen stations carbon flux for (say) April.
  2. selecting points which points to train on for each time partition
    • This a boolean flag passed to fit_and_predict. Currently if true, this notebook will find the closest points to the points we're trying to predict. Currently, this is done in the raw feature space, not an embedding space.

When this notebook is run by default, this is the histogram of correlation coefficients:

screen shot 2018-08-29 at 12 34 31 pm

The red line is at the mean of the correlation coefficients, 0.3927. Future work is to improve this now that the basic framework is in place.