Closed leorocher closed 7 years ago
Hi @leorocher is this still happening for you? The slybot tests run fine on the CI server
Tried again today after upgrading to slybot 0.13.0b19, I still have the same issue
FYI here is what I have installed:
pip freeze
Pillow==3.2.0
PyDispatcher==2.0.5
PyYAML==3.11
Scrapy==1.1.0
Twisted==16.1.1
adblockparser==0.5
autobahn==0.10.4
bcrypt==2.0.0
cffi==1.5.2
characteristic==14.3.0
chardet==2.3.0
cryptography==1.3.1
cssselect==0.9.1
dateparser==0.2.0
dulwich==0.9.7
enum34==1.1.2
funcparserlib==0.3.6
functools32==3.2.3.post2
idna==2.1
ipaddress==1.0.16
jdatetime==1.7.4
jsonschema==2.4.0
loginform==1.0
lupa==1.3
lxml==3.4.1
monotonic==0.3
mysql-connector-python==1.2.3
ndg-httpsclient==0.4.0
numpy==1.11.0
page-finder==0.1.1
parse==1.6.6
parsel==1.0.2
psutil==4.1.0
psycopg2==2.6.1
pyOpenSSL==16.0.0
pyasn1==0.1.9
pyasn1-modules==0.0.8
pycparser==2.14
pysqlite==2.8.2
python-dateutil==2.4.2
pytz==2016.3
qt5reactor==0.3
queuelib==1.4.2
re2==0.2.23
regex==2016.4.3
requests==2.7.0
scrapely==0.12.0
scrapyjs==0.1.1
service-identity==14.0.0
six==1.10.0
slybot==0.13.0b19
slyd==0.0.0
splash==2.1
txaio==2.2.2
umalqurra==0.2
w3lib==1.14.2
wsgiref==0.1.2
xvfbwrapper==0.2.8
zope.interface==4.1.3
@leorocher: Can you try again with slybot 0.13.0b20?
MBS:portia samirfor$ git log -1
commit 0526b932479cd9fe28ce587c676ec27719345d3d
Author: Ruairi Fahy <ruairifahy91@gmail.com>
Date: Wed Oct 26 14:37:12 2016 +0100
Release slybot 0.13.0b26
Fix issue with empty css selectors
Change IblItem to be a subclass of scrapy.Item
...
MBS:portia samirfor$ docker run -itp 9001:9001 --rm -v $PWD/data:/app/slyd/slyd/data:rw --name portia portia
2016-11-01 02:50:27+0000 [-] Log opened.
2016-11-01 02:50:27.947515 [-] Splash version: 2.2.1
2016-11-01 02:50:27.950249 [-] WARNING: Lua scripting is not available because 'lupa' Python package is not installed
2016-11-01 02:50:27.952117 [-] Qt 5.5.1, PyQt 5.5.1, WebKit 538.1, sip 4.17, Twisted 15.4.0
2016-11-01 02:50:27.954090 [-] Python 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2]
2016-11-01 02:50:27.955168 [-] Open files limit: 1048576
2016-11-01 02:50:27.956392 [-] Can't bump open files limit
2016-11-01 02:50:28.434613 [-] Xvfb is started: ['Xvfb', ':1', '-screen', '0', '1024x768x24']
2016-11-01 02:50:30.906415 [-] /app/slyd/slyd/bot.py:25: scrapy.exceptions.ScrapyDeprecationWarning: Module `scrapy.log` has been deprecated, Scrapy now relies on the builtin Python library for logging. Read the updated logging entry in the documentation to learn more.
2016-11-01 02:50:30.925776 [-] /app/slyd/slyd/bot.py:32: scrapy.exceptions.ScrapyDeprecationWarning: Module `scrapy.spider` is deprecated, use `scrapy.spiders` instead
2016-11-01 02:50:32.917288 [-] /app/slyd/slyd/bot.py:60: scrapy.exceptions.ScrapyDeprecationWarning: log.msg has been deprecated, create a python logger and log through it instead
2016-11-01 02:50:32.939310 [-] Site starting on 9002
2016-11-01 02:50:32.941136 [-] Starting factory <slyd.server.Site instance at 0x7f87c15cb128>
...
QNetworkReplyImplPrivate::error: Internal problem, this method must only be called once.
2016-11-01 02:50:16.956547 [-] "127.0.0.1" - - [01/Nov/2016:02:50:16 +0000] "GET /proxy?url=https%3A%2F%2Fwww.bilheteriavirtual.com.br%2Fimg%2Fheader-bg.png&tabid=139648766211024&referer=www.bilheteriavirtual.com.br HTTP/1.0" 200 48394 "http://192.168.99.100:9001/proxy?url=https%3A%2F%2Fwww.bilheteriavirtual.com.br%2Fcss%2Fstyle.css&tabid=139648766211024&referer=www.bilheteriavirtual.com.br" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36"
Traceback (most recent call last):
File "/app/slyd/slyd/splash/ferry.py", line 165, in sendMessage
self.protocol.sendMessage(metadata(self.protocol))
File "/app/slyd/slyd/splash/commands.py", line 100, in metadata
res.update(extract(socket))
File "/app/slyd/slyd/splash/commands.py", line 116, in extract
js_items, js_links = extract_data(url, html, socket.spider, templates)
File "/app/slyd/slyd/splash/utils.py", line 26, in extract_data
for value in spider.parse(page(url, html)):
File "/app/slybot/slybot/spider.py", line 228, in _handle
for item_or_request in itertools.chain(*generators):
File "/app/slybot/slybot/plugins/scrapely_annotations/annotations.py", line 121, in handle_html
htmlpage = htmlpage_from_response(response, _add_tagids=True)
File "/app/slybot/slybot/utils.py", line 103, in htmlpage_from_response
encoding=response.encoding)
File "/usr/local/lib/python2.7/dist-packages/scrapely/htmlpage.py", line 78, in __init__
assert isinstance(body, unicode), "unicode expected, got: %s" % type(body).__name__
AssertionError: unicode expected, got: str
Aborted
Should be fixed with latest scrapely
Hi,
I have an enconding error when running some of the the Slybot Tests. The error seems to be linked to the use of the scrapely library.
Here is for example what I have for test_spider.SpiderTest:
I am using python 2.7, Portia 16.06.1, slybot 0.13.0b18 and scrapely 0.12.0. Any clues?
Note that this error is not only happening in the tests but also at run time when running the scrapely extraction on some webpages.