Open schlichtanders opened 6 years ago
In R there are two approaches for declaring label and features:
model(x = features, y = label)
model(label ~ features, data = data)
Arithmetic operations are supported only with "formula interface", because this way they become part of the model object (eg. can be serialized/deserialized in RDS data format). However, the support for "formula interface" varies considerably between R packages - it is best supported by several built-in packages (eg. the base
package, which provides glm()
and lm()
functions), reasonably supported by several others (eg. earth
and randomForest
packages), and not at all supported by many more.
You need to check the documentation of your target R package/function if it supports the "formula interface" or not.
If possible, can you provide example code?
See the following presentation: https://www.slideshare.net/VilluRuusmann/converting-r-to-pmml-82182483
There are many in-formula feature engineering examples starting from slide 13.
Thanks a lot for the many explanations, comments and link. That is great
Looking over it you have many examples with "as.formular". These are exactly the things which I would like to have WITHOUT wrapping it into a linear model or else. Just straight these formulars. No special R package. That is really not possible?
I hoped for something like a plain "model" function given by r2pmml which is kind of an identity wrapper around the formula or something
On Tue, 5 Dec 2017, 19:15 Villu Ruusmann, notifications@github.com wrote:
In R there are two approaches for declaring label and features:
- "Matrix interface": model(x = features, y = label)
- "Formula interface": model(label ~ features, data = data)
Arithmetic operations are supported only with "formula interface", because this way they become part of the model object (eg. can be serialized/deserialized in RDS data format). However, the support for "formula interface" varies considerably between R packages - it is best supported by several built-in packages (eg. the base package, which provides glm() and lm() functions), reasonably supported by several others (eg. earth and randomForest packages), and not at all supported by many more.
You need to check the documentation of your target R package/function if it supports the "formula interface" or not.
If possible, can you provide example code?
See the following presentation: https://www.slideshare.net/VilluRuusmann/converting-r-to-pmml-82182483
There are many in-formula feature engineering examples starting from slide 13.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jpmml/r2pmml/issues/35#issuecomment-349392446, or mute the thread https://github.com/notifications/unsubscribe-auth/AEDu-ECsdZ3a3PVK6JrDuZX_HOMx2wacks5s9YhAgaJpZM4Q2ru- .
These are exactly the things which I would like to have WITHOUT wrapping it into a linear model
You mean taking a stats::formula
object, and converting it into a PMML fragment?
formula = as.formula(...)
r2pmml(formula, "formula.pmml")
What will happen to those PMML fragments afterwards? Want to copy-paste them manually to someplace else?
The PMML thinking is that formula objects cannot exist in isolation. They have to be associated with a model object or, alternatively, be converted to some-sort of function definition (typically a DerivedField
element).
However, it would be possible to teach the r2pmml
package to take notice of stats::formula
objects, and emit a partial result in this case (ie. the results wouldn't be a complete PMML document, but a fragment of it).
thank you very much for the explanations and for paraphrasing my thoughts. Now I really feel understood.
Thanks a lot!
Suppose you create a stats::formula
object like this:
#library("r2pmml")
formula = as.formula(y ~ I(x1 + x2))
#r2pmml(formula, "formula.pmml")
A formula object could be translated to a singleton DerivedField
element. However, this element cannot exist in isolation, there must be accompanying DataField
elements that define its input and output fields (names, data and operational types, etc).
A corresponding PMML fragment might look like this:
<PMML>
<DataDictionary>
<DataField name="x1" dataType="double" optype="continuous"/>
<DataField name="x2" dataType="double" optype="continuous"/>
</DataDictionary>
<TransformationDictionary>
<DerivedField name="y" dataType="double" optype="continuous">
<Apply function="+">
<FieldRef field="x1"/>
<FieldRef field="x2"/>
</Apply>
</DerivedField>
</TransformationDictionary>
</PMML>
This kind of "partial conversion" can be very helpful if you're trying to convert a piece of R (or Python) code into PMML. It will be very easy to copy the above DataField
and DerivedField
elements and paste them into some other PMML document (that needs to be enhanced with more feature engineering logic).
at the package's README https://github.com/jpmml/r2pmml#model-formulae it says that one can use nice R syntax to define normal arithmetic processing of the data when using GLM or so
Are they also supported independently of LM/GLM, I mean to create simple models, just involving simple arithmetics.
If possible, can you provide example code? If not, can it be supported in general?