Open hammy4815 opened 1 year ago
Adding multi threading support in this case should be easy. But we should do some rigorous benchmarks to...
The python implementations vectorize the operations. While this requires much more memory, it does allow for better cache management etc by the compiler.
Mainly we can multithread the main for loops of assemble. Possibly there are other changes to be made to improve it on the Julia side. It's currently slower than it should be.