Open jaluecht opened 6 years ago
You are right, that the default code in the Scrape and Parse Text From Websites doesn't work without some modifications to it. The website is likely blocking straight urllib
calls without using a common user agent.
I get the following error when using the default code from RealPython, as you have above:
HTTPError: HTTP Error 403: Forbidden
If anyone else if having problems here, this should work (if wanting to continue to use urllib
as part of the exercise):
from urllib.request import Request, urlopen
mozilla_request = Request('https://realpython.com/practice/aphrodite.html',
headers={'User-Agent': 'Mozilla/5.0'})
html_page = urlopen(mozilla_request)
html_text = html_page.read().decode('utf-8')
print(html_text)
NOTE: I definitely need to credit this StackOverflow for my answer method above
Using the requests
package works fine, too. It isn't a standard library, so a pip install requests
or pipenv install requests
would need to be run beforehand:
import requests
r = requests.get("https://realpython.com/practice/aphrodite.html")
print(r.text)
I am following you documentation to obtain the Aphrodite web page with the following code:
from urllib.request import urlopen
my_address = "https://realpython.com/practice/aphrodite.html"
html_page = urlopen(my_address) html_text = html_page.read().decode('utf-8')
print(html_text)
I am getting SSL errors. When I add the cafile option, I get invalid certificate error. How can I makke this work?