stevenpawley / Pyspatialml

Machine learning modelling for spatial data
GNU General Public License v3.0
145 stars 29 forks source link

`predict_proba` called twice in source code... #20

Closed amine-aboufirass closed 4 years ago

amine-aboufirass commented 4 years ago

Can I ask what is the point of line 836 in raster.py? This line appears to call the predict_proba method from sklearn, but then uses the result only to extract its dtype and shape. The result is never written into or included in the returned Raster object.

The function predict_proba in pyspatialml's Raster appears to call sklearn's predict_proba a second time in line 1095 under _probfun and I think this gets written into the result. I think I understand why it is called this second time. I am confused as to why it gets called in line 836.

stevenpawley commented 4 years ago

If I remember correctly, I used the first call to predict_proba (on a tiny window of data) to check what the output dimensions of the result will be, i.e. the number of classes with probabilities without using the training data. This is so when writing windows of data with rasterio, the raster metadata can be correctly set with the correct number of bands before opening a file for writing.

amine-aboufirass commented 4 years ago

Hi @stevenpawley thanks for your response. Couldn't you simply get that from the estimator that is passed into your predict_proba method? Something like estimator.classes_.shape[0] would work I think?

stevenpawley commented 4 years ago

Good point, I forgot about that. I'll update the predict methods in the next commit.