Open BruceLee66 opened 5 years ago
DecAtt is very difficult to train, which I tried many ways to make it work, including gradient clipping, sorted length and etc. Previously people used length sorting to accelerate the model training and convergence speed, since the input doesn't vary a lot.
When i use this model for wikiQA Task,i found that the batch list is difficult. Why should we resort the length?And The interval of batch_list is not 32.