Open Evernow opened 1 year ago
urlretrieve is a legacy interface that is not recommended as per docs. Regarding the issue, it seems a request object is constructed without Host header and hence the parsed host value from url is used. In the next loop since Host is already set the value you add in addheaders is skipped. You might want to try something modified from this page like below constructing your own request object with appropriate headers : https://docs.python.org/3/howto/urllib2.html?highlight=urllib2#fetching-urls
https://docs.python.org/3/library/urllib.request.html?highlight=urlretrieve#legacy-interface
# https://docs.python.org/3/howto/urllib2.html?highlight=urllib2#fetching-urls
import shutil
import tempfile
import urllib.request
def download_helper2(url, fname):
HOST = "us.download.nvidia.com"
headers = dict(
[
(
"User-agent",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:103.0) Gecko/20100101 Firefox/103.0",
),
(
"Referer",
"https://www.amd.com/en/support/graphics/amd-radeon-6000-series/amd-radeon-6700-series/amd-radeon-rx-6700-xt",
),
("Host", HOST),
("test", "test"),
]
)
request = urllib.request.Request(url=url, headers=headers)
with urllib.request.urlopen(request) as response:
with open(fname, "wb") as file_:
shutil.copyfileobj(response, file_)
download_helper2("http://localhost:8000/test", "/tmp/test")
Bug report
I have been having issues confirming this, but urlretrieve seems to ignore the provided Host header even if it's added. It seems to correctly look at User-agent and Referer. I have two functions doing the same download, one with urlretrieve and one with requests. The requests one works as expected and fails in the same way urlretrieve fails if I remove the Host header.
Your environment