vietai / ViT5

MIT License
59 stars 9 forks source link

Cho mình xin dữ liệu sau khi đã lọc "71GB of long paragraphs for 1024-length model" #8

Closed tiendung closed 1 year ago

justinphan3110 commented 1 year ago

You can try to download the original file from CC100 and filter them by the length of your choice. We didn't do any extra preprocess for ViT5.

!wget http://data.statmt.org/cc-100/vi.txt.xz
!unxz vi.txt.xz