Over the last 4 days, I've been playing with parallelizing typer for dotty.
The parallelization points that I tried there are:
type rhs of ValDefs and DefDefs with explicitly written expected type asynchronously
type stats that are not imports\valdefs\defdefs in a block asynchronously.
This was pretty easy to achieve and gave good speedups(7x on Typer.scala on 8 threads), but was failing 90% of times due to multiple levels of non-threadsafe caching in Dotty.
I've started making caches threadsafe and was able to do if for ImplicitsCache, LRUcaches and lastDenot vlaues and now failure rate for Typer.scala went down to 20%, but the speedup is 2x(on 2-8 threads). Most of the time is spent waiting for reads from the classpath that I made synchronized.
The current status of the branch is that the most common failure is infinite cycle in dotty.tools.dotc.util.HashSet due to not growing the table as increments of size are unpublished.
Currently we have some other problems to solve other than performance.
But I believe that this is a promising direction and one day we should go back to it.
Over the last 4 days, I've been playing with parallelizing typer for dotty.
The parallelization points that I tried there are:
This was pretty easy to achieve and gave good speedups(7x on Typer.scala on 8 threads), but was failing 90% of times due to multiple levels of non-threadsafe caching in Dotty.
I've started making caches threadsafe and was able to do if for ImplicitsCache, LRUcaches and lastDenot vlaues and now failure rate for Typer.scala went down to 20%, but the speedup is 2x(on 2-8 threads). Most of the time is spent waiting for reads from the classpath that I made synchronized.
The current status of the branch is that the most common failure is infinite cycle in
dotty.tools.dotc.util.HashSet
due to not growing the table as increments ofsize
are unpublished.Currently we have some other problems to solve other than performance. But I believe that this is a promising direction and one day we should go back to it.