princeton-nlp / TRIME

[EMNLP 2022] Training Language Models with Memory Augmentation https://arxiv.org/abs/2205.12674
194 stars 13 forks source link

preprocessing script missing #4

Closed Hannibal046 closed 1 year ago

Hannibal046 commented 1 year ago

Hi, tokenization seems to be ignored here. After unzip, there are only wiki.train.raw file https://github.com/princeton-nlp/TRIME/blob/2dfbdbd8fad0fe2fd54cf1232b8cdec1bf700ed7/get_data.sh#L12-L21

Hannibal046 commented 1 year ago

sorry for bothering, I mistakenly download raw version