Closed janboeye closed 6 years ago
Lines below tune_config
is the logic for setting tunable parameters.
https://github.com/dmlc/tvm/blob/1d6df5a187127f514ea48f6a8a74c77ff59c89f5/topi/python/topi/mali/conv2d.py#L195-L211 https://github.com/dmlc/tvm/blob/1d6df5a187127f514ea48f6a8a74c77ff59c89f5/topi/python/topi/mali/conv2d.py#L258-L282
for spatial_pack, tunable parameter is VH
, VW
, VC
and num_thread
,
I use gridsearch to set them. You can also use grid search for other workloads. Some sample code can be find here. We feed in our config in L99.
@merrymercy If it run on bifrost architecture, what parameters need to be tuned?
I do not have a bitfrost device. I cannot help you with this problem. But maybe I will tune a bitfrost device in the near future.
@merrymercy It looks like conv2d implementation is not good for bifrost architecture. If I comment out @conv2d.register(["mali"]) and @generic.schedule_conv2d_nchw.register(["mali"]), let conv2d goto opencl implementation, it could achieve better performance.
@merrymercy Why _schedule_im2col_conv2d use __local on mali architecture?
Sorry I am traveling these days and my laptop was broken. It seems that you have solved your issues.
If you remove the registration of Mali, it will use cuda's schedule. I am interested in the bitfrost's results. Which gpu do you use? Could you post more details about the performance?
@merrymercy If use cuda's schedule, the result is about 100ms from 180ms which is using mali's schedule. the unroll number is too large on bifrost architecture.
@merrymercy
Is there any tune guide?
Which parameters could be tuned? Why set num_thread = 8?
Thanks