In the final phase where you choose the best architecture based on their reward, the reward of ptb and cifar10 is set to be c/ppl + (entropy term) and accuracy, respectively. Why did you use entropy term for arc selection of ptb and not for that of cifar10?
In the final phase where you choose the best architecture based on their reward, the reward of ptb and cifar10 is set to be c/ppl + (entropy term) and accuracy, respectively. Why did you use entropy term for arc selection of ptb and not for that of cifar10?