questions about the search & training process

mit-han-lab / hardware-aware-transformers

[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

Other

329 stars 50 forks source link

Hi, I tried to run an evolutionary search on the IWLST14.de-en dataset with a 1080 Ti GPU.

I modified the latency-constraint from 200 to 150 since the 1080 Ti is faster than the Titan XP.

But the best architecture (143 ms) didn't change after ten epochs while the max iteration is 30.

Then I trained the searched architecture using the same configuration file and got only 33.77 BLEU (normal).

My questions are:

Is this phenomenon normal? Does it mean that the search has encountered a local optimum?
How to get comparable scores reported in your list if I use other GPUs with similar latency?

Here is the search log: iwlst.evo.gpu.log

mit-han-lab / hardware-aware-transformers