Why do you NOT use packed padding?

Hi, as per my understanding, the most common usage of the packed sequence is to give it as input to pre-defined RNN modules (e.g. torch.nn.LSTM, torch.nn.GRU, ...). However the batch-normalized RNN requires modification of the computation of recurrent components, thus I thought that there's no advantage of using packed sequences instead of masks. Also, at the time of implementation, pack_padded_sequence couldn't accept unsorted sequences (i.e. the input must be sorted by lengths), which was thought to introduce another complication.

jihunchoi / recurrent-batch-normalization-pytorch

Why do you NOT use packed padding? #14