konstantint / SKompiler

A tool for compiling trained SKLearn models into other representations (such as SQL, Sympy or Excel formulas)
MIT License
171 stars 10 forks source link

Only getting first node of decision tree when converting to SQL #4

Closed matago closed 5 years ago

matago commented 5 years ago

Model type is sklearn DecisionTreeClassifier

expr = skompile(m.predict)

The result is as expected...

IfThenElse(test=BinOp(op=LtEq(), left=IndexedIdentifier(id='x', index=11, size=12), right=NumberConstant(value=0.3329164981842041)), iftrue=IfThenElse(test=BinOp(op=LtEq(), left=IndexedIdentifier(id='x', index=11, size=12), right=NumberConstant(value=0.19435399770736694)), iftrue=IfThenElse(test=BinOp(op=LtEq(), left=IndexedIdentifier(id='x', index=11, size=12), right=NumberConstant(value=0.08358149975538254)), iftrue=NumberConstant(value=0), iffalse=NumberConstant(value=0)), iffalse=IfThenElse(test=BinOp(op=LtEq(), left=IndexedIdentifier(id='x', index=11, size=12), right=NumberConstant(value=0.2531224936246872)), iftrue=NumberConstant(value=0), iffalse=NumberConstant(value=0))), iffalse=IfThenElse(test=BinOp(op=LtEq(), left=IndexedIdentifier(id='x', index=11, size=12), right=NumberConstant(value=0.6303065121173859)), iftrue=IfThenElse(test=BinOp(op=LtEq(), left=IndexedIdentifier(id='x', index=10, size=12), right=NumberConstant(value=0.9569794833660126)), iftrue=NumberConstant(value=1), iffalse=NumberConstant(value=1)), iffalse=IfThenElse(test=BinOp(op=LtEq(), left=IndexedIdentifier(id='x', index=10, size=12), right=NumberConstant(value=0.9592015147209167)), iftrue=NumberConstant(value=1), iffalse=NumberConstant(value=1))))

However, when converting to SQL only the first node is converted and rather than nest in subsequent nodes the first node results are inserted.

sql = expr.to('sqlalchemy/mssql',multistage=True)

SELECT CASE WHEN (x12 <= 0.3329164981842041) THEN 0 ELSE 1 END AS y \nFROM data

konstantint commented 5 years ago

SKompiler simplifies expressions like "if (test) then 1 else 1" recursively. This way your particular expression rolls up to a single if clause.