gordicaleksa / pytorch-original-transformer

My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing otherwise seemingly hard concepts. Currently included IWSLT pretrained models.
https://youtube.com/c/TheAIEpiphany
MIT License
983 stars 169 forks source link

issue when command :python training_script.py --batchsize 2 -- dataset_name IWSLT --language_direction G2E #4

Open adamas-v opened 3 years ago

adamas-v commented 3 years ago

downloading de-en.tgz

File "training_script.py", line 103, in train_transformer train_token_ids_loader, val_token_ids_loader, src_field_processor, trg_field_processor = get_data_loaders(

tarfile.ReadError: not a gzip file

JingshuaiLiu commented 3 years ago

Use Multi30k instead of IWSLT.

Thanks!

xueshengke commented 3 years ago

Use Multi30k instead of IWSLT.

Thanks!

I met the same problem: ` $ export CUDA_VISIBLE_DEVICES=3 && python training_script.py --batch_size 1500 --dataset_name IWSLT --language_direction G2E

downloading de-en.tgz de-en.tgz: 96.9kB [00:00, 12.0MB/s] Traceback (most recent call last): File "/home/xueshengke/anaconda3/envs/transformer_pytorch/lib/python3.6/tarfile.py", line 1643, in gzopen t = cls.taropen(name, mode, fileobj, kwargs) File "/home/xueshengke/anaconda3/envs/transformer_pytorch/lib/python3.6/tarfile.py", line 1619, in taropen return cls(name, mode, fileobj, kwargs) File "/home/xueshengke/anaconda3/envs/transformer_pytorch/lib/python3.6/tarfile.py", line 1482, in init self.firstmember = self.next() File "/home/xueshengke/anaconda3/envs/transformer_pytorch/lib/python3.6/tarfile.py", line 2297, in next tarinfo = self.tarinfo.fromtarfile(self) File "/home/xueshengke/anaconda3/envs/transformer_pytorch/lib/python3.6/tarfile.py", line 1092, in fromtarfile buf = tarfile.fileobj.read(BLOCKSIZE) File "/home/xueshengke/anaconda3/envs/transformer_pytorch/lib/python3.6/gzip.py", line 276, in read return self._buffer.read(size) File "/home/xueshengke/anaconda3/envs/transformer_pytorch/lib/python3.6/_compression.py", line 68, in readinto data = self.read(len(byte_view)) File "/home/xueshengke/anaconda3/envs/transformer_pytorch/lib/python3.6/gzip.py", line 463, in read if not self._read_gzip_header(): File "/home/xueshengke/anaconda3/envs/transformer_pytorch/lib/python3.6/gzip.py", line 411, in _read_gzip_header raise OSError('Not a gzipped file (%r)' % magic) OSError: Not a gzipped file (b'<!')

tarfile.ReadError: not a gzip file `

How do you mean "Use Multi30k"? this code can only support 'IWSLT' and 'WMT14' now.