[x] :warning: Concurrency across reductions is not targeted i.e. results in poor performance for computing error norms, etc. (Maybe we should fix this before merging?)
In the meshmode version, we desperately froze such reductions eagerly
Update: Added a transformation specifically for "reduce-to-scalar" operations to parallelize the reduction operation using loopy transformations.
[ ] We need a better name than SplitPytatoArrayContext.