Open RichardScottOZ opened 3 years ago
Thanks for the contribution Richard - a prediction method that can output distribution will be useful, I'll merge this in the next few days.
Yes, haven't put any sensible relevant comments/doc bits on it, as it was literally just do this so I could output.
I saw a prediction module? Yesterday I was working on a hack for using hdbscan...are the functions in raster.py going to migrate..or version with other uses?
Hi Richard,
I'm just working on a couple of problems relating to the in-memory files feature that I added to Pyspatialml, but I'd like to return to this. NGBoost looks like it uses a predict_dist
method. Do you know if this works within scikit learn's structures, e.g. it can function inside a pipeline etc?
Scikit learn doesn't appear to support prediction intervals very uniformly/extensively. GradientBoostingRegressor enables prediction intervals via quantile predictions, but it does this without a new method, by setting or modifying the 'alpha' parameter of the estimator in-place, and then using the regular predict function for the specified quantile.
My favourite R random forest implementation, ranger
, which there is also a Python wrapper around the C++ libs, also allows quantile prediction, but in Python it uses a predict_quantile
method to perform this, so a different approach again, and so I don't think quantile predictions can be made easily if the estimator is encapsulated within another structure like a Pipeline.
I haven't tried it, but I would guess probably? Only thing I think I remember seeing is a grid search mentioned there.
I was wondering about that a little when I saw your apply function - e.g. if needed StandardScaler raster stack based on the original for clustering - a function and argument dictionary with the array, anything else?
Yes, was wondering the same thing, if the apply method could be used for applying predictions with arbitrary/non-standard methods. I think it can, but I should work through it with an example because I'd still like to use NGBoost or skranger for prediction intervals, but when I tried with skranger it wouldn't work if wrapped inside pipelines or other methods because they don't have a predict_quantiles
method to pass through.
Yes, so possibly might need some sort of overloading custom pipeline hackery in that case, which isn't ideal.
and hdbscan class label estimation looks like this, basically
result, result_strengths_t = hdbscan.approximate_predict(estimator, flat_pixels) (so 2 to do)
and there is #result = estimator.predict_proba(flat_pixels) result = hdbscan.prediction.membership_vector(estimator, flat_pixels) - which gives the probabilities of being in any particular cluster
Hi Steven,
FYI, did this last year to use your work with NGBoost, finally got around to updating.