Closed qingyuanxingsi closed 4 years ago
Thank you!
You write some custom logic that dynamically creates based on some configuration or input arguments. One such example that is already present in the codebase is FeatureBuilder.fromSchema
which dynamically builds all feature extractors for Spark dataframe types. One can then similarly operate on the features using their types and apply further transformations, e.g.:
// Let's assume we materialized these feature dynamically. In this example from a struct type of a Spark dataframe
val df: Dataframe = ???
val (response: FeatureLike[_ <: FeatureType], features: Array[FeatureLike[_ <: FeatureType]) =
FeatureBuilder.fromSchema(dataframe.schema, response = "label")
// Apply type specific transformations for particular feature types (can be conditioned based on your config)
val texts: Array[FeatureLike[Text]] = features.collect { case f if f.isSubtypeOf[Text] => f.asInstanceOf[FeatureLike[Text]] }
val tokenized: Array[FeatureLike[TextList]] = texts.map(_.tokenize())
val integrals: Array[FeatureLike[Integral]] = features.collect { case f if f.isSubtypeOf[Integral] => f.asInstanceOf[FeatureLike[Integral]] }
val abs: Array[FeatureLike[Integral]] = integrals.map(_.abs())
// Vectorize all the desired features
val vectorized: FeatureLike[Vector] = (tokenized ++ abs).transmogrify(label = Some(label))
@tovbinm I'm aware of your point, my point is that we would like to have flexible control over the transformations applied to a/several columns to support hand-crafted features(different dataset different treatment), we would like to load a json file and construct a certain transformer automatically(possibly without explicit mapping), we do not want to modify the code, but allow the json file to determine the transformation pipeline. Fixed Transformations to given type cannot meet all our requirements.
Like a common way to dynamically build a transformer/estimator based on a json file with params all set. No code modification, just config
I see. You would need to develop some custom code to interpret json config file into a sequence of custom transformations in TransmogrifAI.
We did something similar in the past. Perhaps @tillbe would be willing to reveal some ideas on how to implement it?
@tovbinm @tillbe Any updates?
The implementation will depend on your exact needs, but ultimately you will have to write a DSL/Parser for your custom features, e.g. using ANTLR - at least that's how we solved it when we had a similar use case. An alternative is using FastParse if you want to stay in native Scala.
I hope this helps, happy to elaborate further.
@tillbe My previous idea is like this:
{
"className":"com.salesforce.op.stages.impl.feature.AliasTransformer",
"params":{
"outputFeatureName":"test"
}
}
We can define a json file like this, load it and parse it into an AliasTransfomer to make the transformation, then actually we can create more transformers like AliasTransformer? I'm wondering why this will not work?
If it won't work, can you give a more detailed example illustrating your idea??I'm new to ANTLR.
In general this approach should work. You probably have to parse the JSON into a case class first and then pattern match on it. In this case, you won't need ANTLR or FastParse, using a JSON decoder like circe is enough.
case class CustomFeature(
transformer: CustomTransformer, // enums that list all custom transformers you want to support
params: Map[String, String],
... // anything else you need to store
)
val customFeaturesString = // load json file here
import io.circe.parser.decode // you can any json library here
val decoded = decode[Seq[CustomFeature]](customFeaturesString)
val features = decoded.map { customFeature => customFeature.transformer match {
case AliasTransformer => // do something
case OtherTransformer => // do something else
case ... // and so on
}}
And the json file can look like this:
[
{
"transformer": "AliasTransformer",
"params": {
"outputFeatureName": "test"
}
},
{
"transformer": "OtherTransformer",
"params": {
"otherParam": "2"
}
}
]
You probably need to add some information to your JSON file as to what fields to operate, and other parameters but that depends on your use case.
You only need ANTLR/FastParse if you want more complicated syntax in your custom feature schema - JSON will simplify this a lot!
Thanks for your great work! In our testing, we found that automatic feature engineering can not fully address our problems, we would like the ability to cutomize the transformer pipeline.
For example, we would like to configure the feature transformation using a json file, load it and perform feature transformations accordingly, any guideline to accomplishing this??
Much thanks!