romulus0914 / MixNet-PyTorch

A PyTorch Implementation of MixNet
MIT License
91 stars 26 forks source link

About Hyperparameters #5

Closed huangdi95 closed 5 years ago

huangdi95 commented 5 years ago

Thanks for your release! I have a question on hyperparameters in training. According to the original tf implementation, mixnet can be trained using the same config as mnasnet by running mnasnet_main.py. However, it is shown that some hyperparameters is different from your implementation. https://github.com/tensorflow/tpu/blob/master/models/official/mnasnet/configs/mnasnet_config.py For example, drop rate in mnasnet_config.py is fixed to 0.2 while yours is 0.25 on mixnet_m. Although your implementation is the same as mixnet_model.py, 'override_param' in mnasnet_main.py shows that it will override the hyperparams by mnasnet_config.py. https://github.com/tensorflow/tpu/blob/master/models/official/mnasnet/mixnet/mixnet_model.py https://github.com/tensorflow/tpu/blob/master/models/official/mnasnet/mnasnet_main.py The batchnorm momentum and epsilon have the same question. It is confusing here. What's your suggestion?

romulus0914 commented 5 years ago

Hi,

It's absolutely fine with writing another config file to set up the hyper-parameters. However, in my repo, I follow those hyper-parameters reported in the MixNet repo. After all, they're just hyper-parameters.

Thanks.

huangdi95 commented 5 years ago

Thanks for your reply!