Open Sun-Happy-YKX opened 1 year ago
请问兄弟你解决了嘛?可否进一步交流一下~
the same question ,Please help me to provide a linux script if you can.
链接挂了,直接提示403forbidden,难怪运行也会报错,server直接挂了
First you can download dataset into yout own computer:
train = wget "https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/training.tar.gz"
valid =wget "https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/validation.tar.gz"
test =wget "https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/mmt_task1_test2016.tar.gz"
and unzip them to any route (just a case "~/Python/DATASETS/Multi30k/
") .
Then you can use TranslationDataset class to load the data and split them:
from torchtext.datasets import TranslationDataset, Multi30k
ROOT = '~/Python/DATASETS/Multi30k/'
Multi30k.download(ROOT)
(trnset, valset, testset) = TranslationDataset.splits(
path = ROOT,
exts = ['.en', '.de'],
fields = [('src', srcfield), ('trg',tgtfield)],
test = 'test2016'
)
ref: https://github.com/pytorch/text/issues/312#issuecomment-406092660
I'm new to transformer recently and don't know how to get the dataset in this project. Please help me to provide a linux script if you can.