mit-han-lab / hardware-aware-transformers

[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
https://hat.mit.edu
Other
329 stars 50 forks source link

questions about the search & training process #4

Closed huchinlp closed 4 years ago

huchinlp commented 4 years ago

Hi, I tried to run an evolutionary search on the IWLST14.de-en dataset with a 1080 Ti GPU.

I modified the latency-constraint from 200 to 150 since the 1080 Ti is faster than the Titan XP.

But the best architecture (143 ms) didn't change after ten epochs while the max iteration is 30.

Then I trained the searched architecture using the same configuration file and got only 33.77 BLEU (normal).

My questions are:

  1. Is this phenomenon normal? Does it mean that the search has encountered a local optimum?
  2. How to get comparable scores reported in your list if I use other GPUs with similar latency?

Here is the search log: iwlst.evo.gpu.log

Hanrui-Wang commented 4 years ago

Hi Huchi,

Thanks for your questions!

  1. Typically the evo search converges after 20 epochs. You can increase the mutation rate to see if that will give better results.
  2. As mentioned in our paper session 3.2, the testing of IWSLT is using the average of the last ten checkpoints, which will give higher BLEU. You can use configs/iwslt14.de-en/average_checkpoint.sh to perform the average.