Closed sanju9522 closed 7 years ago
@sanju9522 I would advice to use the requests library (included in anaconda)
# Third party imports
from bs4 import BeautifulSoup
import requests
# specify the url
url = "https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India"
# Query the website and return the html to the variable 'page'
r = requests.get(url)
# Parse the html in the 'page' variable, and store it in Beautiful Soup format
soup = BeautifulSoup(r.content)
print(soup)
Hi goanpeca, I ran the code request library as you suggested, but still am getting the same error.
Error Log: raise ConnectionError(e, request=request) ConnectionError: HTTPSConnectionPool(host='en.wikipedia.org', port=443): Max retries exceeded with url: /wiki/List_of_state_and_union_territory_capitals_in_India (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x0000000009970E80>: Failed to establish a new connection: [WinError 10051] A socket operation was attempted to an unreachable network',))
@sanju9522 it seems you have network issues. It is not an spyder related problem. See https://stackoverflow.com/questions/23013220/max-retries-exceeded-with-url
Your network don't allow you to use proxy
Description of your problem
Hi Experts,
I am trying to scrap web information using spyder, but am getting urlopen error.
Below is my code: import urllib.request
specify the url
wiki = "https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India"
Query the website and return the html to the variable 'page'
page = urllib.request.urlopen(wiki)
import the Beautiful soup functions to parse the data returned from the website
from bs4 import BeautifulSoup
Parse the html in the 'page' variable, and store it in Beautiful Soup format
soup = BeautifulSoup(page)
print (soup.prettify())
What is the expected output? What do you see instead? After printing it i should get html tags for scraping the web page
Below is the Error Log:
File "", line 7, in
page = urllib.request.urlopen(wiki)
File "C:\Program Files\Anaconda3\lib\urllib\request.py", line 163, in urlopen return opener.open(url, data, timeout)
File "C:\Program Files\Anaconda3\lib\urllib\request.py", line 466, in open response = self._open(req, data)
File "C:\Program Files\Anaconda3\lib\urllib\request.py", line 484, in _open '_open', req)
File "C:\Program Files\Anaconda3\lib\urllib\request.py", line 444, in _call_chain result = func(*args)
File "C:\Program Files\Anaconda3\lib\urllib\request.py", line 1297, in https_open context=self._context, check_hostname=self._check_hostname)
File "C:\Program Files\Anaconda3\lib\urllib\request.py", line 1256, in do_open raise URLError(err)
URLError: <urlopen error [WinError 10051] A socket operation was attempted to an unreachable network>
Versions and main components
Thanks & Regards, Sanjay