Problems with `CSR_MVUCONFIG1`

ha-zhuzhu commented 1 year ago

Hi, thanks for your brilliant work! The design of MVU and hart control is impressive. I'm tring to get the right simmulation output.hex by fusesoc run --target=sim barvinn. But when I read the waves, I find that some signals related to scaler, bias or other configs unset in the c code are at x state, resulting the final all-zero quant_out.

So I try to set these configs in conv2d.c by my understanding to the project:

    SET_CSR(CSR_MVUSCALER, 1);  // scaler_b=1
    SET_CSR(CSR_MVUUSESCALER_MEM,0);
    SET_CSR(CSR_MVUUSEBIAS_MEM,0);

Now I'm cunfused with CSR_MVUCONFIG1. The docs indicates that it's about Shift/accumulator load on jump select and Shift/accumulator load on jump select, and they're all 8-bit respectively. This does match with the comment in BARVINN/deps/MVU/verification/lib/mvu/mvu_pkg.sv (branch 72b5413) .

While in mvutop_wrapper.sv and mvutop_wrapper.sv, CSR_MVUCONFIG1 seems to control shacc_load_sel and zigzag_step_sel, 5-bit respectively.

These two configs seem to be important in cumputing, but MVU_Code_gen doesn't export them. I also find them in some test files like MVU/c/conv2d_jumps.c or mvutop_tester.sv, but still don't quite get the relationship between them and other model parameters. If I just want to run the sample conv2d (1x64x32x32 input, 64x64x3x3 weight and 2-bit precision), how can I calculate shacc_load_sel and zigzag_step_sel?

I also have a question about quantized model computation. At the end of a layer's computation, scaler module can rescale a quantized output. And the next layer's input should be quantized again. But It seems that the quantser module is not able to quantize inputs by a certain scale. So how does BARVINN deal with this process? I'd appreciate it if you can offer some examples about multi-layers quantized model!

hossein1387 commented 1 year ago

Hi @ha-zhuzhu, Thank you for your interest in BARVINN! First I have to mention that the documentation is a bit out of date and we need to update it. Regarding the CSRs, I actually recommend you to use this file as a reference instead. We use this json file to generate configuration for Systemverilog and C header files. For the conv2d example, you can leave the shacc_load_sel and zigzag_step_sel as it is.

For multi-layer quantized models we need communication with the host and we are currently building a memory interface for it. However, if your model is small enough and you can store it entirely on the available memory space, you can run your your model by running different kernels back to back. This is not convenient and not realistic, but our main objective now is to provide an interface for all MVUs to communicate with the host. Please let me know if you have more questions.

ha-zhuzhu commented 1 year ago

Thank you for your detailed explanation! I finally get the exact same results in quant_out as output.hex after some changes, I'm not sure whether I understand the source code correctly:

I modified the SimpleConv model with zero padding for my test, because I don't know how BARVINN deals with padding. With a complex jump schedule like deps/MVU/c/conv2d_jumps.c:381?
Set shacc_load_sel to b00001 and zigzag_step_sel to b00101 in order to sum up a single pixel's result if WLENGTH_1=0. ( shacc_load_sel should be b00010 and zigzag_step_sel be b00110 if WLENGTH_1!=0).
Did some changes to MVU_CODE_GEN to get MVU format data and weights because:
1. In function __process_weigths, the actual onnx weights shape is [output_channels, input_channels, height, width] instead of [input_channels, output_channels, width, height].
2. Just simply flaten the tensor may not generate corrent MVU format data.
3. In function get_mvu_param:
```
ijump[0] = -iprec*(iC*(fH-1)*iW + (fW-sW-1)*iC + 1)
# why not:
# ijump[0] = -iprec*(iC*(fH-1)*iW + (fW-sW)*iC -1)
```
  Because in SimpleConv model, iC=1, sW=1, which makes ijump[0]=ijump[1], kernel will never move right by a stride.

Now I can get the right result, still wondering how to do padding...

And about the multi-layer quantized model. As each conv layer has different scale in common quantize strategies, I'm curious about how to rescale the 32-bit conv result of MVP (with scale1) and quantize it (with scale2) to be the input of next layer. Should I merge scale1 and scale2 into one so I can use Scaler and Quantser to process it? This seems like a feasible way.

wagnersj commented 1 year ago

@ha-zhuzhu regarding the scalers, the idea is to merge/fuse scalars together such that you only need the one scalar unit. We don't have a multiplier in the quantser module as a result. This scheme is sufficient to implement LSQ and even batch norm layers (with scalar fusing). We are working on another branch of the MVU code that will have a second scalar unit to add some flexibility.

hossein1387 / BARVINN

Problems with `CSR_MVUCONFIG1` #24