idstcv / ZenNAS

218 stars 35 forks source link

is it possible for you to release the retrain log file? #17

Closed aptsunny closed 2 years ago

aptsunny commented 2 years ago

hi, @MingLin-home , In paper and official code, retrain the searched model by the feature loss cost too much resource, the 1.2ms latency model is very difficult to train completely end to end. so whether the series model‘s log can be released?

image
MingLin-home commented 2 years ago

Hi aptsunny, Thank you for the feedback! I cannot find the log file now. Sorry for the bad news :(

aptsunny commented 2 years ago

ok, at first, I appreciate you for fast response. By the way, I notice the teacher_arch model from a semi-supervised learning approach. i wonder know why do you use the teacher like "geffnet_tf_efficientnet_b3_ns" to distill the student model? it's cause unfair to the baseline model? do you compare the other teacher to distill the searched model before?

thank you in advance.

MingLin-home commented 2 years ago

Hi aptsunny, Sorry for the late feedback. The teacher network is randomly chosen. We did not test other teacher networks.

Using teacher network is important in NAS. The NAS designed networks are usually deeper and narrower, which means they are more difficult to train. With TS, the improvement of NAS-designed network is often much larger than manually designed networks. For example, TS usually gives 3%~4% accuracy improvement on NAS-designed network, but merely 0.5%~1% on manually designed networks.

Back to ZenNAS. Since ZenNAS always gives us the maximal capacity structures, it is no surprise that with insufficient training data, its generalization error will suffer (due to the curse of dimensionality). Once it is given sufficiently large training data, it can fit much better than other models.

Wish the above answered your concerns! Please let us know if you have more questions!

Best Regards, Ming Lin

MingLin-home commented 1 year ago

Distillation is a common practice in NAS literatures. We found that NAS architectures are difficult to train because they are deeper and narrower. Distillation has larger impact on NAS networks. Usually, the improvement of distill on manually designed networks such as ResNet is very small, around 0.5%. But for NAS networks, the improvement can be up to 4.5%.

The teacher network is chosen by random. We simple choose one with 84% accuracy while not to large to us. We did not try other teacher networks, but we believe the results should be similar.

On Sat, Apr 16, 2022, 07:26 Yue Sun @.***> wrote:

ok, at first, I appreciate you for fast response. By the way, I notice the teacher_arch model from a semi-supervised learning approach. i wonder know why do you use the like "geffnet_tf_efficientnet_b3_ns" to distill the student model? it's cause unfair to the baseline model? do you compare the other teacher to distill the searched model before?

thank you in advance.

— Reply to this email directly, view it on GitHub https://github.com/idstcv/ZenNAS/issues/17#issuecomment-1100675649, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIQVWPNSO34UTJLUD7FP73VFLEZHANCNFSM5TSIYAUA . You are receiving this because you were mentioned.Message ID: @.***>