niklasb / dryscrape

[not actively maintained] A lightweight Python library that uses Webkit to enable easy scraping of dynamic, Javascript-heavy web pages
http://dryscrape.readthedocs.io/
MIT License
533 stars 67 forks source link

EndOfStreamError when visiting sites (Yelp, google business) remotely #57

Open replyprobadler opened 8 years ago

replyprobadler commented 8 years ago

When trying to visit 'https://www.Yelp.com/login' I get the error:

webkit_server.EndOfStreamError: Unexpected end of file

I'm able to visit a plethora of other urls with my current code; however, whenever trying to visit yelp login I always get this error.

Running the program locally it works perfectly. This issue only occurs when running remotely on a linode server.

This issue also happens on 'https://www.google.com/business'.

the code:

import dryscrape
url = 'https://www.yelp.com/login'
dryscrape.start_xvfb()
session = dryscrape.Session(base_url=url)
session.visit('')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/opt/virtualenvs/monkytrends/lib/python3.5/site-packages/dryscrape/session.py", line 33, in visit
    return self.driver.visit(self.complete_url(url))
  File "/opt/virtualenvs/monkytrends/lib/python3.5/site-packages/webkit_server.py", line 235, in visit
    self.conn.issue_command("Visit", url)
  File "/opt/virtualenvs/monkytrends/lib/python3.5/site-packages/webkit_server.py", line 520, in issue_command
    return self._read_response()
  File "/opt/virtualenvs/monkytrends/lib/python3.5/site-packages/webkit_server.py", line 524, in _read_response
    result = self.buf.read_line().decode("utf-8")
  File "/opt/virtualenvs/monkytrends/lib/python3.5/site-packages/webkit_server.py", line 485, in read_line
    raise EndOfStreamError()
webkit_server.EndOfStreamError: Unexpected end of file

Thank you for the awesome work!

niklasb commented 8 years ago

Seems like webkit_server crashes. Do you have a chance to debug the webkit_server process?

replyprobadler commented 8 years ago

Yes, not sure where to start with it, but I'm willing to give it a go.

niklasb commented 8 years ago

You should be able to sleep in your Python script before visiting the site, and attach gdb to the webkit_server process before.

replyprobadler commented 8 years ago

I'm not sure what you're wanting me to do with gdb. Here's what I did:


setsid python dryscrape_test.py
ps -aef | grep webkit

zenmonk  19340 16362  0 20:08 ?        00:00:00 [webkit_server] <defunct>
zenmonk  29000 28988  0 23:53 ?        00:00:00 /opt/virtualenvs/monkytrends/lib/python3.5/site-packages/webkit_server
zenmonk  29006 26625  0 23:53 pts/3    00:00:00 grep --color=auto webkit

gdb attach 28988

GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
attach: No such file or directory.
Attaching to process 28988
Reading symbols from /opt/virtualenvs/monkytrends/bin/python3.5...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libpthread-2.19.so...done.
done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Loaded symbols for /lib/x86_64-linux-gnu/libpthread.so.0
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libc.so.6
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libdl.so.2
Reading symbols from /lib/x86_64-linux-gnu/libutil.so.1...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libutil-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libutil.so.1
Reading symbols from /lib/x86_64-linux-gnu/libexpat.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libexpat.so.1
Reading symbols from /lib/x86_64-linux-gnu/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libz.so.1
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libm.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.19.so...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /opt/virtualenvs/monkytrends/lib/python3.5/site-packages/lxml/etree.cpython-35m-x86_64-linux-gnu.so...done.
Loaded symbols for /opt/virtualenvs/monkytrends/lib/python3.5/site-packages/lxml/etree.cpython-35m-x86_64-linux-gnu.so
Reading symbols from /usr/lib/x86_64-linux-gnu/libxslt.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/x86_64-linux-gnu/libxslt.so.1
Reading symbols from /usr/lib/x86_64-linux-gnu/libexslt.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/x86_64-linux-gnu/libexslt.so.0
Reading symbols from /usr/lib/x86_64-linux-gnu/libxml2.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/x86_64-linux-gnu/libxml2.so.2
Reading symbols from /lib/x86_64-linux-gnu/libgcrypt.so.11...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libgcrypt.so.11
Reading symbols from /lib/x86_64-linux-gnu/liblzma.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/liblzma.so.5
Reading symbols from /lib/x86_64-linux-gnu/libgpg-error.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libgpg-error.so.0
Reading symbols from /opt/virtualenvs/monkytrends/lib/python3.5/lib-dynload/_opcode.cpython-35m-x86_64-linux-gnu.so...(no debugging symbols found)...done.
Loaded symbols for /opt/virtualenvs/monkytrends/lib/python3.5/lib-dynload/_opcode.cpython-35m-x86_64-linux-gnu.so
Reading symbols from /opt/virtualenvs/monkytrends/lib/python3.5/lib-dynload/_json.cpython-35m-x86_64-linux-gnu.so...(no debugging symbols found)...done.
Loaded symbols for /opt/virtualenvs/monkytrends/lib/python3.5/lib-dynload/_json.cpython-35m-x86_64-linux-gnu.so
Reading symbols from /opt/virtualenvs/monkytrends/lib/python3.5/lib-dynload/_bz2.cpython-35m-x86_64-linux-gnu.so...(no debugging symbols found)...done.
Loaded symbols for /opt/virtualenvs/monkytrends/lib/python3.5/lib-dynload/_bz2.cpython-35m-x86_64-linux-gnu.so
Reading symbols from /lib/x86_64-linux-gnu/libbz2.so.1.0...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libbz2.so.1.0
Reading symbols from /opt/virtualenvs/monkytrends/lib/python3.5/lib-dynload/_lzma.cpython-35m-x86_64-linux-gnu.so...(no debugging symbols found)...done.
Loaded symbols for /opt/virtualenvs/monkytrends/lib/python3.5/lib-dynload/_lzma.cpython-35m-x86_64-linux-gnu.so
Reading symbols from /opt/virtualenvs/monkytrends/lib/python3.5/lib-dynload/_hashlib.cpython-35m-x86_64-linux-gnu.so...(no debugging symbols found)...done.
Loaded symbols for /opt/virtualenvs/monkytrends/lib/python3.5/lib-dynload/_hashlib.cpython-35m-x86_64-linux-gnu.so
Reading symbols from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
0x00007fa65e1f0c33 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
81  ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) c
Continuing.
Traceback (most recent call last):
  File "dryscrape_test.py", line 7, in <module>
    session.visit('')
  File "/opt/virtualenvs/monkytrends/lib/python3.5/site-packages/dryscrape/session.py", line 33, in visit
    return self.driver.visit(self.complete_url(url))
  File "/opt/virtualenvs/monkytrends/lib/python3.5/site-packages/webkit_server.py", line 235, in visit
    self.conn.issue_command("Visit", url)
  File "/opt/virtualenvs/monkytrends/lib/python3.5/site-packages/webkit_server.py", line 520, in issue_command
    return self._read_response()
  File "/opt/virtualenvs/monkytrends/lib/python3.5/site-packages/webkit_server.py", line 524, in _read_response
    result = self.buf.read_line().decode("utf-8")
  File "/opt/virtualenvs/monkytrends/lib/python3.5/site-packages/webkit_server.py", line 485, in read_line
    raise EndOfStreamError()
webkit_server.EndOfStreamError: Unexpected end of file
[Inferior 1 (process 28988) exited with code 01]
niklasb commented 8 years ago

Well my conjecture was that webkit_server crashes and hence you get the error. But apparently that is not that case.