shekharbiswas / ScrapeAmazon

1 stars 1 forks source link

Issues with product.py #1

Open reikairen opened 3 years ago

reikairen commented 3 years ago

PS C:\Users\reika\Desktop\ScrapeAmazon> C:\Users\reika\AppData\Local\Programs\Python\Python38\python.exe .\product.py Error occurred during loading data. Trying to use cache server https://fake-useragent.herokuapp.com/browsers/0.1.11 Traceback (most recent call last): File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 1319, in do_open h.request(req.get_method(), req.selector, req.data, headers, File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\http\client.py", line 1230, in request self._send_request(method, url, body, headers, encode_chunked) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\http\client.py", line 1276, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\http\client.py", line 1225, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\http\client.py", line 1004, in _send_output self.send(msg) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\http\client.py", line 944, in send self.connect() File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\http\client.py", line 915, in connect self.sock = self._create_connection( File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\socket.py", line 808, in create_connection raise err File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\socket.py", line 796, in create_connection sock.connect(sa) socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\fake_useragent\utils.py", line 64, in get with contextlib.closing(urlopen( File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 222, in urlopen return opener.open(url, data, timeout) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 525, in open response = self._open(req, data) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 542, in _open result = self._call_chain(self.handle_open, protocol, protocol + File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 502, in _call_chain result = func(*args) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 1348, in http_open return self.do_open(http.client.HTTPConnection, req) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 1322, in do_open raise URLError(err) urllib.error.URLError:

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\fake_useragent\utils.py", line 164, in load browsers_dict[browser_key] = get_browser_versions( File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\fake_useragent\utils.py", line 120, in get_browser_versions html = get( File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\fake_useragent\utils.py", line 84, in get raise FakeUserAgentError('Maximum amount of retries reached') fake_useragent.errors.FakeUserAgentError: Maximum amount of retries reached Downloading https://www.amazon.com/Python-Crash-Course-2nd-Edition/dp/1593279280/ref=sr_1_1?crid=199UWV7418DTJ&dchild=1&keywords=no+starch+press+python&qid=1597177840&refinements=p_72%3A1250221011%2Cp_85%3A2470955011%2Cp_n_condition-type%3A1294423011&rnid=1294421011&rps=1&s=books&sprefix=no+starch+press%2Caps%2C294&sr=1-1 Traceback (most recent call last): File ".\product.py", line 46, in data = scrape(url) File ".\product.py", line 41, in scrape return e.extract(r.text) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\selectorlib\selectorlib.py", line 74, in extract fields_data[selector_name] = self._extract_selector(selector_config, sel) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\selectorlib\selectorlib.py", line 85, in _extract_selector elements = parent_parser.css(field_config['css']) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\parsel\selector.py", line 282, in css return self.xpath(self._css2xpath(query)) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\parsel\selector.py", line 285, in _css2xpath return self._csstranslator.css_to_xpath(query) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\parsel\csstranslator.py", line 107, in css_to_xpath return super(HTMLTranslator, self).css_to_xpath(css, prefix) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\cssselect\xpath.py", line 192, in css_to_xpath for selector in parse(css)) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\cssselect\parser.py", line 415, in parse return list(parse_selector_group(stream)) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\cssselect\parser.py", line 428, in parse_selector_group yield Selector(parse_selector(stream)) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\cssselect\parser.py", line 436, in parse_selector result, pseudo_element = parse_simple_selector(stream) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\cssselect\parser.py", line 544, in parse_simple_selector raise SelectorSyntaxError( cssselect.parser.SelectorSyntaxError: Expected selector, got <DELIM '#' at 0> PS C:\Users\reika\Desktop\ScrapeAmazon> C:\Users\reika\AppData\Local\Programs\Python\Python38\python.exe .\product.py |clip Traceback (most recent call last): File ".\product.py", line 46, in data = scrape(url) File ".\product.py", line 41, in scrape return e.extract(r.text) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\selectorlib\selectorlib.py", line 74, in extract fields_data[selector_name] = self._extract_selector(selector_config, sel) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\selectorlib\selectorlib.py", line 85, in _extract_selector elements = parent_parser.css(field_config['css']) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\parsel\selector.py", line 282, in css return self.xpath(self._css2xpath(query)) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\parsel\selector.py", line 285, in _css2xpath return self._csstranslator.css_to_xpath(query) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\parsel\csstranslator.py", line 107, in css_to_xpath return super(HTMLTranslator, self).css_to_xpath(css, prefix) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\cssselect\xpath.py", line 192, in css_to_xpath for selector in parse(css)) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\cssselect\parser.py", line 415, in parse return list(parse_selector_group(stream)) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\cssselect\parser.py", line 428, in parse_selector_group yield Selector(parse_selector(stream)) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\cssselect\parser.py", line 436, in parse_selector result, pseudo_element = parse_simple_selector(stream) File "C:\Users\reika\AppData\Local\Programs\Python\Python38\lib\site-packages\cssselect\parser.py", line 544, in parse_simple_selector raise SelectorSyntaxError( cssselect.parser.SelectorSyntaxError: Expected selector, got <DELIM '#' at 0>

shekharbiswas commented 3 years ago

Looks like the issue is with the fake user agent. So, requests are getting timed out. Would be nice if you give a context: did you try to implement on heroku , own python flask server etc.

Thanks