linkedin / photon-ml

A scalable machine learning library on Apache Spark
Other
792 stars 185 forks source link

featureShardConfigurations when scoring a mixed effects model with random intercept only #478

Open kpalaka opened 3 years ago

kpalaka commented 3 years ago

In my first attempt using Photon-ML, I have trained a mixed effects model that includes an intercept-only random effect. I then try to use the scoring driver, using the tutorial as a template.

I'm having trouble specifying the right feature shard for the random intercept. During training, I assign no feature bags to the feature shard (by omitting it in the cli command). In the scoring part, what's the right thing to do?

Thanks in advance!

 val globalFeatureShardId = "sGlobal"
 val globalFeatureShard = Set("Bu", "Bg", "Bc")

 val perUserFeatureShardId = "sUser"
 val perUserFeatureShard = Set("uuid") 
 //  only looking for random intercept and therefore the shard is empty, what do I enter here?
 // should i put the userid (uuid) in a bag by itself and use it here?

 val fixedShardConfigs = Map(globalFeatureShardId -> FeatureShardConfiguration(globalFeatureShard, true))
 val mixedShardConfigs = Map(perUserFeatureShardId -> FeatureShardConfiguration(perUserFeatureShard, true)) ++ fixedShardConfigs

 GameScoringDriver.set(GameScoringDriver.featureShardConfigurations, mixedShardConfigs)
kpalaka commented 3 years ago

It seems the set of feature bags I choose above doesn't matter. Since, at training time I specified random intercept only, at scoring time, the GameScoringDriver can be fooled with any set of feature bags. At least, it works. If somebody could confirm this behavior, I'd appreciate it.

At training time, I did:

...
--feature-shard-configurations "name=sGlobal, intercept=true, feature.bags=Bu|Bg|Bc "   \
--feature-shard-configurations "name=sUser, intercept=true"  \
...

At scoring time, it works if I do:

val globalFeatureShardId = "sGlobal"
val globalFeatureShard = Set("Bu", "Bg", "Bc")

val perUserFeatureShardId = "sUser"
val perUserFeatureShard = Set("Bu") // just to fool the driver