Open rfernand2 opened 5 years ago
I noticed today that some downloads (like Multi NLI dataset from https://www.nyu.edu/projects/bowman/multinli/multinli_1.0.zip) require the following additional header (also used by Chrome):
('Accept', "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8")
So, the "opener.addheaders=..." line in the workaround should be replaced with:
opener.addheaders = \
[
('User-Agent', \
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"),
('Accept', "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8")
]
This has been fixed in the following pull request. https://github.com/tensorflow/tensor2tensor/pull/1235
Description
I consistently get "HTTP Error 403: Forbidden" error when trying to download the "babi" dataset. It happens when using either "t2t-datagen" or "t2t-trainer". Workaround provided at end of issue.
Environment information
For bugs: reproduction and error logs
Error logs:
Issue Workaround
I found that the following code, replacing line 113 in the file "lib\site-packages\tensor2tensor\data_generators\babi_qa.py", fixed the problem: