discussed with @xgaozoyoe, here are a list of potential optimizations that we want to look at
[ ] using a non-overflow version of non-native field multiplication
currently we use an optimized version to handle overflow in non-native mul; this reduces the number of rows but in the meantime, multiplication takes variable length and is bad for parallelization. We want to consider using a fixed-rows non-optimized mul -- this trades MSMs for parallel witness generation
[ ] customized layouter
the layouter runs in two phases and operates over mock witnesses during the first phase; real witnesses during second phase. The first phase, although does not have any real witnesses, still goes through all the operations that real witnesses are performed over. We may skip these ops.
discussed with @xgaozoyoe, here are a list of potential optimizations that we want to look at
[ ] using a non-overflow version of non-native field multiplication
[ ] customized layouter
[ ] remove witness labels: #32