haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
6.02k stars 1.13k forks source link

Tree Representation for Regression Models in Google Earth Engine #744

Closed thomaslauber closed 1 year ago

thomaslauber commented 1 year ago

I am a researcher that works extensively with Google Earth Engine (GEE) and smile made a huge contribution to my work and to the work of many others! Thanks so much for that!

Currently, I am working on implementing a python package that will download a tree-based model trained in GEE and fully replicate it in sklearn, which will allow researchers to compute SHAP values locally and help better understand the model's behaviour. I am in contact with Noel Gorelick, the chief software engineer at GEE, and we thought this could be a very valuable contribution.

So far, the package works fine for Random Forests and a Decision Trees in Classification mode thanks to the (I think) compressed order representation. Is there a way to include this representation in the Regression Tree as well? So, the ideal solution would be if the Regression Models would behave in the exact same way as the Classification models. This would be needed to be implemented in V1 of smile, which seems to be the current version GEE is using.

Here attached some screenshots that hopefully visually explain the problem:

Output of a Classification Tree in GEE: image (one can see the tree representation at the bottom)

Output of a Random Forest in Classification mode: image (one can see the tree representation at the bottom for both trees inside the forest)

Output of a Regression Tree in GEE: image (one can see there is no tree representation)

Output of a Random Forest in Regression mode: image (also here, there is no tree representation)

lihaife commented 1 year ago

Thanks. Smile v2+ already supports tree representation in compact text and graphviz format for both classification and regression trees.