pulp-platform / pulp-nn

Apache License 2.0
76 stars 15 forks source link

Size of pIm2ColBuffer in pulp_nn_conv_Co_parallel #5

Open MiguelCosta94 opened 3 years ago

MiguelCosta94 commented 3 years ago

Hi,

Can someone tell me to which size should I set the pIm2ColBuffer of pulp_nn_conv_Co_parallel in order to exploit the parallel execution in Gapuino with 8 cores?

NBruschi commented 3 years ago

Hi @MiguelCosta94,

im2col buffer is a buffer that stores all pixels inside the convolution window. Since the data layout is HWC, the dimension of an im2col is dim_ker_h X dim_ker_w X ch_in. Moreover, PULP-NN conv (and matmul) kernels exploit two im2col buffers at the same time, in order to compute in the same innermost loop of the matrix multiplication kernel, 2 adjacent pixels and 4 channels each. For this reason, you should allocate 2 im2col_dim num_of_cores bytes in L1 to exploit the whole parallelism of such kernels.

MiguelCosta94 commented 3 years ago

Hi @NBruschi ,

Thanks for your reply. That said, the size of pIm2ColBuffer is not the source of my problem. Can you tell me if every buffer (including inputs, outputs, weights and bias) needs to be allocated in L1 memory for the pulp_nn_conv_Co_parallel work properly? I am storing them in L2 memory, but I am experiencing some random behavior when I change the number of cores to use in the parellel computation. When using only one core, I get the expected accuracy, but when I start to increase the number of cores for pulp_nn_conv_Co_parallel, the accuracy drops drastically.

NBruschi commented 3 years ago

From the cluster side, the cores can access L2. The reason why you should avoid it is that every access requires several cycles to be done (around 10x w.r.t. L1), but if you are not interested in the time, at least at this stage, it is easily feasible. Anyway, the PULP-NN kernels are optimized to work on input, output, weights and im2col buffers in L1, but if you want you should be able to allocate the same buffer in L2 instead of in L1. Which platform are you using for testing? Can you kindly share your test application?