SynapseML relies on code generation to automate much of the boilerplate regarding many methods, especially the parameter setting/getting methods. Right now the way that this work works for many of the objects in this library is based on an enumeration of the parameters, and then the methods are generated.
This works fine for the way the library is structured, but it could use some enhancements. In particular, there might be some advantages to shifting some of the aspects of code generation to enhance some of the advantages one gets to shifting towards Scala reflection through classtags, as opposed to instance based reflection. This would enable to avoid some issues relating to type-erasure.
Read only Param instances
Let's imagine someone writes an estimator/transformer pair that adhered to SynapseML idioms, that is, a hyperparameter is accessible in both the estimator and the .fit transformer, but is only settable in the estimator. We see this with, for instance, a GBTRegressor, you can do regressor.setStepSize(0.1).fit(df).getStepSize to recover the parameters set on the estimator on the learnt transformer, even though of course one would not be able to call setStepSize on the fit transformer.
If one were to attempt to run this type of class through the code-gen we see that the parameters.
Non-Parameter Methods
In addition to the most important use case of parameters, t might be nice if the codegen were capable of working with the types of methods that are not associated with a parameter. This does not mean that any method should be able to be automatically generated -- this would be a pretty tall order. But there are some methods, say, inquiries about the state of a model that might be more conventionally served by a method unassociated with a specific single parameter. These methods in particular are often somewhat tricky to figure out in case they are in any way generic.
Hey @TomFinley :wave:!
Thank you so much for reporting the issue/feature request :rotating_light:.
Someone from SynapseML Team will be looking to triage this issue soon.
We appreciate your patience.
SynapseML relies on code generation to automate much of the boilerplate regarding many methods, especially the parameter setting/getting methods. Right now the way that this work works for many of the objects in this library is based on an enumeration of the parameters, and then the methods are generated.
This works fine for the way the library is structured, but it could use some enhancements. In particular, there might be some advantages to shifting some of the aspects of code generation to enhance some of the advantages one gets to shifting towards Scala reflection through classtags, as opposed to instance based reflection. This would enable to avoid some issues relating to type-erasure.
Read only
Param
instancesLet's imagine someone writes an estimator/transformer pair that adhered to SynapseML idioms, that is, a hyperparameter is accessible in both the estimator and the
.fit
transformer, but is only settable in the estimator. We see this with, for instance, aGBTRegressor
, you can doregressor.setStepSize(0.1).fit(df).getStepSize
to recover the parameters set on the estimator on the learnt transformer, even though of course one would not be able to callsetStepSize
on the fit transformer.If one were to attempt to run this type of class through the code-gen we see that the parameters.
Non-Parameter Methods
In addition to the most important use case of parameters, t might be nice if the codegen were capable of working with the types of methods that are not associated with a parameter. This does not mean that any method should be able to be automatically generated -- this would be a pretty tall order. But there are some methods, say, inquiries about the state of a model that might be more conventionally served by a method unassociated with a specific single parameter. These methods in particular are often somewhat tricky to figure out in case they are in any way generic.
AB#1932712