Some questions about searching

idstcv / GPU-Efficient-Networks

Apache License 2.0

191 stars 37 forks source link

Some questions about searching #6

Closed lucienne999 closed 3 years ago

lucienne999 commented 4 years ago

Hi, first of all, thanks for this interesting work. I have some questions about the searching method and results in this paper.

Could I understand the least square regression as this function: f(delta(depth), delta(width)) = dealt(A)? The situation of different kernel sizes and expand ratio is considered in the "Distillation" process?
In your settings, the search space of kernel size is {3,5}, DW-Block expand ration is {3,6,9}, BL-Block expand ratio is {0.5, 0.25}. However, the result of GENet, in which the kernel size all is 3, and the DW-Block expand ratio all is 3, the BL-Block expand ratio all is 0.25. This phenomenon is really weird. I'm looking forward to more analysis here.

Thx.

MingLin-home commented 3 years ago

Hi LicharYuan,

Thanks for the feedback!

For Q1, yes, you are right. For different kernel size and different expansion ratio, we consider them as different block types, which means that, they have their own least square regression coefficients respectively.

For Q2, we agree that the searching results are surprising to us too. Our conjecture is that k=3 is more well optimized in CUDA. For comparison, we trained many k=5 networks but did not obtain better results. For BL-Block, the expansion ratios are all 0.25. We believe there should be some connection between our searching results and the manually design ones which also use 0.25 (in most networks). In our unpublished experiments, we tried to manually set ratio=1/6 in BL-Blocks but never obtained better results. So this phenomenon is not simply because the searching bias but because some deeper unknown reason making 0.25 a preferred value.

lucienne999 commented 3 years ago

Yes, All k=3 is reasonable. I think the result may depend on your searching method . Have you ever try to maually change one or two blocks parameters? for example, change the first the XX-BLOCK kernel size to 5 and channel to 40 for compensating the latency. Is that result still worse?

MingLin-home commented 3 years ago

Have you ever try to maually change one or two blocks parameters?

We did not manually change one or two blocks in our structures. We usually change all blocks in some principle way which is easier to explore.

lucienne999 commented 3 years ago

OK. It sounds like the searching method may not stable when the searching space becomes larger.....
Actually, the searching space with prior knowledge does help to search in my experiments. Looking forward to your next works!