salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
https://transmogrif.ai
BSD 3-Clause "New" or "Revised" License
2.24k stars 394 forks source link

Model versioning #397

Open tovbinm opened 5 years ago

tovbinm commented 5 years ago

Problem Exported models neither have any version information nor verification checks (on loading) that verify that a particular model can be safely executed with the current code version.

Solution

  1. Include code version in serialized model json file.
  2. Implement verification logic that would check if a particular model can be safely executed with the current code version. If not, a proper error should be raised.

Alternatives N/A

shenzgang commented 5 years ago

How do I deploy my model into production as a predictive service?

tovbinm commented 5 years ago

@shenzgang it depends how would you like to run the inference. You can either use Spark runtime to do it or use transmogrifai-local - https://github.com/salesforce/TransmogrifAI/tree/master/local

shenzgang commented 5 years ago

Why does the data need to include label columns when using models for prediction? The training model sample contains feature columns and tag columns, but the data that needs to be predicted has no tag columns, and I had to use a temporary tag column replacement to be successful?

tovbinm commented 5 years ago

How so? I don't think it does. Simply load the model OpWorkflowModel.load("/path/to/model") and then run score() or prepare scoreFunction for local scoring.

shenzgang commented 5 years ago

I used Titan data to get the model and then load the model to predict, I removed the tag column prediction failure! The prediction data must contain the same number of columns as the model training! my code: val model = OpWorkflowModel.load(modelPath) val scoreFn = model.scoreFunction(spark) val rawData = Seq(//If i remove the label column 'survived',the prediction will fail!! Map("id"->248,"survived"->1,"pclass"->2,"name"->"Hamalainen Mrs. William (Anna)","sex"->"female","age"->24,"sibsp"->0,"parch"->2,"ticket"->"250649","fare"->14.5,"cabin"->null,"embarked"->"S"), Map("id"->249,"survived"->1,"pclass"->1,"name"->"Beckwith Mr. Richard Leonard","sex"->"male","age"->37,"sibsp"->1,"parch"->1,"ticket"->"11751","fare"->52.5542,"cabin"->"D35","embarked"->"S"), Map("id"->250,"survived"->1,"pclass"->2,"name"->"Carter Rev. Ernest Courtenay","sex"->"male","age"->54,"sibsp"->1,"parch"->0,"ticket"->"244252","fare"->26,"cabin"->null,"embarked"->"S") ) val scores = rawData.map(scoreFn) scores.foreach(println(_))