lemma-osu / gee-knn-python

0 stars 0 forks source link

Run `RawKNNClassifier._predict_fc` as parallel to avoid memory issues #9

Open grovduck opened 7 months ago

grovduck commented 7 months ago

Closes #8 by creating a joblib.Parallel job to retrieve predictions for a feature collection. Options are provided for specifying the size of the batch (chunk_size) and the number of threads to use (num_threads). Once all neighbors are retrieved, the result is stitched back together into an ee.FeatureCollection.

Note that this is a way to do this server-side, but that may not be the best workflow for this use case. Typically, one wants to run the feature collection mode to do cross-validation on the plots used to fit the model or run a new set of targets. We are investigating the possibility of: 1) converting the feature collection client-side; and 2) using sknnr to run the prediction locally.

We will keep this PR open as we decide on the best path forward.