neulab / awesome-align

A neural word aligner based on multilingual BERT
https://arxiv.org/abs/2101.08231
BSD 3-Clause "New" or "Revised" License
325 stars 47 forks source link

Move collate() out of word_align() #24

Closed quocthang0507 closed 3 years ago

quocthang0507 commented 3 years ago

Move collate() out of word_align() and change tokenizer.pad_token_id to modeling.PAD_ID Additionally, I reformat code style by default

zdou0830 commented 3 years ago

Hi, thanks a lot for the contribution, but I may not merge this because in run_train.py, it is necessary to have separate collate functions for training and testing, thus for consistency it might be better to have collate() inside word_align() here as well.