get_topo_order_degopt interlace

The get_topo_order_degopt as described in #19 is not the only option of width optimal topo order for degopts. Rather than only working row-by-row we can interlace with the T2k-coeffs:

 :T2k2
 :Ba2
 :Bb2
 :B2 
 :T2k3
 :Ba3_2
 :Ba3
 :Bb3_2
 :Bb3
 :B3
 :T2k4
....

which allows us to "forget" BX the step after it is computed.

I suggest we add a kwarg to get_topo_order_degopt(k;mode=:rowwise) or get_topo_order_degopt(k;mode=:rowwise_interlaced).

I think this variant can save one memslot in a blas code generated setting.

matrixfunctions / GraphMatFun.jl

get_topo_order_degopt interlace #24