An auxiliary Encoder objects initialization took a long time, effectively becoming a bottleneck of parallel solver (based on IntelliJ profiler). The object is initialized dynamically (via spark implicits) every time the aggregation or transformation is performed on the problem data, which means several times per algorithm iteration. For a single distributed problem it is not an issue. For parallel solving of millions of small problems it is re-created unnecessarily. The solution is to recylce a small set of encoder object (via static references). Note: the encoder is not even necessary for parallel solver, but it is hard to get around the API requirements made for distributed solver, so we pass the encoder anyway. It would be nice to refactor the code in such a way that unnecessary encoder object is not created.
Attached profiler screenshot shows that call to org.apache.spark.sql.SQLImplicits.newProductEncoder(TypeTags$TypeTag) takes 97.68% of its parent call (primal variable computation).
Results:
For 200 executors, 1M dataset
New run: 1h 15m
Old run: 1d 15h 15m
The results suggest ~30X improvement, may not be directly comparable due to cluster conditions variability.
An auxiliary Encoder objects initialization took a long time, effectively becoming a bottleneck of parallel solver (based on IntelliJ profiler). The object is initialized dynamically (via spark implicits) every time the aggregation or transformation is performed on the problem data, which means several times per algorithm iteration. For a single distributed problem it is not an issue. For parallel solving of millions of small problems it is re-created unnecessarily. The solution is to recylce a small set of encoder object (via static references). Note: the encoder is not even necessary for parallel solver, but it is hard to get around the API requirements made for distributed solver, so we pass the encoder anyway. It would be nice to refactor the code in such a way that unnecessary encoder object is not created.
Attached profiler screenshot shows that call to org.apache.spark.sql.SQLImplicits.newProductEncoder(TypeTags$TypeTag) takes 97.68% of its parent call (primal variable computation).
Results: For 200 executors, 1M dataset New run: 1h 15m Old run: 1d 15h 15m
The results suggest ~30X improvement, may not be directly comparable due to cluster conditions variability.