radanalyticsio / jiminy-predictor

a predictor service for a spark based recommendation app
Apache License 2.0
2 stars 4 forks source link

Massive performance improvement with `Model` instantiation #15

Closed ruivieira closed 6 years ago

ruivieira commented 6 years ago

Since py4j method of passing data to the JVM is not very efficient, by avoid passing the model store data to the Scala helper jar for model instantiation and instead creating JavaRDDs on the Python side and passing those, model instatiation was reduced from 200.1 seconds to 0.94 seconds.

ruivieira commented 6 years ago

@elmiko ptal

elmiko commented 6 years ago

lgtm, tested against the rest of jiminy without issue.

i think the only thing we need to be careful of is that each rev bump on the jar file requires a new command to launch the application. we probably need to figure out either a) how to inject the jar file programmatically or b) how to stabilize the name/location of that file.