idstcv / ZenNAS

218 stars 35 forks source link

Search in RegNet Space #9

Closed soyebn closed 2 years ago

soyebn commented 2 years ago

I was experimenting with RegNet like search space but with little different group_size. If I use my hand designed network (model_hand) I get 74.05 % top-1 accuracy with latency L. My modest objective is to learn a model (model_zennas) of same latency (L) but at least gives same accuracy as hand designed network. I have kept training schedule exactly same between model_hand and model_zennas but I got 1.4% less. When I look at ZenScore of model_hand it is 112 where as for model_zennas it is 136.0. So it looks like it is not about model_hand not being part of search possibilities. But some how ZenScore assigns lesser score to higher accuracy model.

So my question is, if there is some tweak I can do to the way ZenScore is defined so that model_hand has similar score to model_zennas. If that happens I can expect model_zennas also generates similar accuracy as model_hand.

I like your way of searching much as it does not need training so exploring to accomplish above. Do you think it is appropriate for such objective?

MingLin-home commented 2 years ago

Dear Soyebn,

Thank you for the update! Based on our experience, I think there are something you can tweek:

  1. The maximal depth of the network is important. You can try to find a reasonable value by bisection search. When your depth fall into the feasible range, the accuracy is usually similar. However, if your model is too deep or too shallow, you will much worse results.
  2. From your 136.0 score, I feel it might be too shallow. Also please ensure you set the evolution iteration to 480k to ensure the convergence. The 48k in the paper might be a very tight budget.
  3. The teacher-student training is very critical to ZenNets, similar to ViT training. Since the searched model is much more expressive, it can easily overfit or converge to a bad local optimal. We are still working on the understanding and a better training strategy to overcome this drawback. For now, the verified workaround is teacher-student distillation with inner feature maps included as well.

Thank you again for the update! Wish you find it helpful!

dovedx commented 2 years ago

这是来自QQ邮箱的假期自动回复邮件。   邮件我已经成功接收,我会尽快处理并答复,谢谢!