sanju9522 commented 7 years ago

Description of your problem

Hi Experts,

I am trying to scrap web information using spyder, but am getting urlopen error.

Below is my code: import urllib.request

specify the url

wiki = "https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India"

Query the website and return the html to the variable 'page'

page = urllib.request.urlopen(wiki)

import the Beautiful soup functions to parse the data returned from the website

from bs4 import BeautifulSoup

Parse the html in the 'page' variable, and store it in Beautiful Soup format

soup = BeautifulSoup(page)

print (soup.prettify())

What is the expected output? What do you see instead? After printing it i should get html tags for scraping the web page

Below is the Error Log:

File "", line 7, in page = urllib.request.urlopen(wiki)

File "C:\Program Files\Anaconda3\lib\urllib\request.py", line 163, in urlopen return opener.open(url, data, timeout)

File "C:\Program Files\Anaconda3\lib\urllib\request.py", line 466, in open response = self._open(req, data)

File "C:\Program Files\Anaconda3\lib\urllib\request.py", line 484, in _open '_open', req)

File "C:\Program Files\Anaconda3\lib\urllib\request.py", line 444, in _call_chain result = func(*args)

File "C:\Program Files\Anaconda3\lib\urllib\request.py", line 1297, in https_open context=self._context, check_hostname=self._check_hostname)

File "C:\Program Files\Anaconda3\lib\urllib\request.py", line 1256, in do_open raise URLError(err)

URLError: <urlopen error [WinError 10051] A socket operation was attempted to an unreachable network>

Versions and main components

Spyder Version:
Python Version: 3.5
Qt Version:
PyQt Version:
Operating system: Windows

Thanks & Regards, Sanjay

goanpeca commented 7 years ago

@sanju9522 I would advice to use the requests library (included in anaconda)


# Third party imports
from bs4 import BeautifulSoup
import requests

# specify the url
url = "https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India"

# Query the website and return the html to the variable 'page'
r = requests.get(url)

# Parse the html in the 'page' variable, and store it in Beautiful Soup format
soup = BeautifulSoup(r.content)
print(soup)

sanju9522 commented 7 years ago

Hi goanpeca, I ran the code request library as you suggested, but still am getting the same error.

Error Log: raise ConnectionError(e, request=request) ConnectionError: HTTPSConnectionPool(host='en.wikipedia.org', port=443): Max retries exceeded with url: /wiki/List_of_state_and_union_territory_capitals_in_India (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x0000000009970E80>: Failed to establish a new connection: [WinError 10051] A socket operation was attempted to an unreachable network',))

goanpeca commented 7 years ago

@sanju9522 it seems you have network issues. It is not an spyder related problem. See https://stackoverflow.com/questions/23013220/max-retries-exceeded-with-url

SaadTazroute commented 9 months ago

Your network don't allow you to use proxy

spyder-ide / spyder

urlopen error [WinError 10051] A socket operation was attempted to an unreachable network #4942