niklasb / dryscrape

[not actively maintained] A lightweight Python library that uses Webkit to enable easy scraping of dynamic, Javascript-heavy web pages
http://dryscrape.readthedocs.io/
MIT License
533 stars 67 forks source link

Dryscrape to login via Facebook. #37

Open noppanit opened 9 years ago

noppanit commented 9 years ago

I'm trying to use Dryscrape to login via Facebook. But I get these error.

Traceback (most recent call last):
  File "/Users/noppanit/.virtualenvs/envpy3/lib/python3.4/site-packages/webkit_server.py", line 420, in __init__
    self._port = int(re.search(b"port: (\d+)", output).group(1))
AttributeError: 'NoneType' object has no attribute 'group'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "facebook_scraper.py", line 40, in <module>
    sess = dryscrape.Session(base_url = 'https://www.facebook.com')
  File "/Users/noppanit/.virtualenvs/envpy3/lib/python3.4/site-packages/dryscrape/session.py", line 22, in __init__
    self.driver = driver or DefaultDriver()
  File "/Users/noppanit/.virtualenvs/envpy3/lib/python3.4/site-packages/dryscrape/driver/webkit.py", line 30, in __init__
    super(Driver, self).__init__(**kw)
  File "/Users/noppanit/.virtualenvs/envpy3/lib/python3.4/site-packages/webkit_server.py", line 230, in __init__
    self.conn = connection or ServerConnection()
  File "/Users/noppanit/.virtualenvs/envpy3/lib/python3.4/site-packages/webkit_server.py", line 507, in __init__
    self._sock = (server or get_default_server()).connect()
  File "/Users/noppanit/.virtualenvs/envpy3/lib/python3.4/site-packages/webkit_server.py", line 450, in get_default_server
    _default_server = Server()
  File "/Users/noppanit/.virtualenvs/envpy3/lib/python3.4/site-packages/webkit_server.py", line 427, in __init__
    raise WebkitServerError("webkit-server failed to start. Output:\n" + err)
webkit_server.WebkitServerError: webkit-server failed to start. Output:
dyld: Library not loaded: @rpath/./libQtWebKit.4.dylib
  Referenced from: /Users/noppanit/.virtualenvs/envpy3/lib/python3.4/site-packages/webkit_server
  Reason: image not found

Here's the code I'm using.

import dryscrape

# make sure you have xvfb installed
dryscrape.start_xvfb()

# set up a web scraping session
sess = dryscrape.Session(base_url = 'https://www.facebook.com')

# we don't need images
sess.set_attribute('auto_load_images', False)

# visit homepage and search for a term
sess.visit('/')
q = sess.at_xpath('//*[@id="email"]')
q.set('email')
q = sess.at_xpath('//*[@id="pass"]')
q.set("password")
login_button = sess.at_xpath('//*[@id="u_0_x"]')
login_button.click()

# save a screenshot of the web page
sess.render('facebook.png')
print("Screenshot written to 'facebook.png'")
trendsetter37 commented 9 years ago

The login field has an id of u_0_v not u_0_x like you have in your code.

Try this: login_button = sess.at_xpath('//*[@id="u_0_v"]')

niklasb commented 9 years ago

It sounds like you have a problem with your Qt installation. I just used Homebrew: brew install qt and it worked. You potentially have to install dryscrape after this.

What @trendsetter37 said might also be a problem, especially if the ID is randomized, in which case you need a different XPath expression to select the button.

trendsetter37 commented 9 years ago

@niklasb Yea I have ran into that before with randomized id's on login pages. However, I checked Facebook's login button and it was static as far as I could tell.

noppanit commented 9 years ago

@trendsetter37 Thanks for the reply. I uninstall dryscrape using pip uninstall dryscrape and ran brew install qt and reinstall dryscrape again but I still see the same issue.

niklasb commented 9 years ago

Hi @noppanit. I haven't seen this error before. Which version of Mac OS X are you using? And which version of Qt was installed by Homebrew?

noppanit commented 9 years ago

@niklasb My OSX version is 10.10.5 and Qt is qt-4.8.7_1

ghost commented 8 years ago

hi @noppanit, Were you able to solve this? I'm having the same issue as well.

noppanit commented 8 years ago

@KickingHorse no I wasn't able to solve it. I just gave up. It looks like Facebook doesn't like scraping.

niklasb commented 8 years ago

Does the libQtWebKit.4.dylib file even exist on your computers @KickingHorse @noppanit ?

KidDisco commented 8 years ago

It does. The problem I was having was my code would work from the command line but the IDEs I was using (PyCharm and Spyder) kept complaining about the error above :

dyld: Library not loaded: @rpath/./libQtWebKit.4.dylib
  Referenced from: /Users/noppanit/.virtualenvs/envpy3/lib/python3.4/site-packages/webkit_server
  Reason: image not found 

Took me forever to figure it out, but eventually had to do something like this :

import os
os.environ.putenv('DYLD_FALLBACK_LIBRARY_PATH', '/Users/name/anaconda/lib/')

This ultimately fixed the problem for me.

This is where the file libQtWebKit.4.dylib was actually stored after installing from PyCharm. I'm not sure where brew -install qt put these files as I was never able to find them.

dcguim commented 8 years ago

I just tryed to run a simple dryscrape.Session(url) and got the same error.
Just as @noppanit said, I got: dyld: Library not loaded: /usr/local/opt/qt/lib/QtWebKit.framework/Versions/4/QtWebKit I am on a Mac OS similar to @noppanits but i dont have Homebrew installed, I am trying to use Mac Ports only. When installing qt4-mac from mac ports, it does not install qt under /usr/local/opt but rather /opt/local/libexec So I created a symlink in /usr/local/opt to libexec: ln -s /opt/local/libexec/qt4 /usr/local/opt/qt But dryscrape does not find the dyld file cause there is no such path. Once I created the symlink, I searched for the missing /usr/local/opt/qt/lib/QtWebKit.framework/Versions/4/QtWebKit But there`s none as I said:

$ ls /usr/local/opt/qt/lib | grep 'QtWebKit'
libQtWebKit.4.9.7.dylib
libQtWebKit.4.9.dylib
libQtWebKit.4.dylib
libQtWebKit.dylib
libQtWebKit.prl

Dryscrape is searching the following file:

$ port contents qt4-mac | grep 'QtWebKit.framework.*/4/QtWebKit'
  /opt/local/libexec/qt4/Library/Frameworks/QtWebKit.framework/Versions/4/QtWebKit

Thats why that symlink wont work, any ideas in how to solve this qt-dryscrape relationship?

dcguim commented 8 years ago

Actually, there is a way to symlink it correctly but it is necessary to symlink each Qt 4 framework separately.

/usr/local/opt/qt/lib $:sudo ln -s /opt/local/libexec/qt4/Library/Frameworks/QtCore.framework/ QtCore.framework
/usr/local/opt/qt/lib $:sudo ln -s /opt/local/libexec/qt4/Library/Framework/QtGui.framework/ QtGui.framework
/usr/local/opt/qt/lib $:sudo ln -s /opt/local/libexec/qt4/Library/Framework/QtNetwork.framework/ QtNetwork.framework 
/usr/local/opt/qt/lib $:sudo ln -s /opt/local/libexec/qt4/Library/Framework/QtWebKit.framework/ QtWebKit.framework 

This way it`s not necessary to download homebrew, which is nice cause there are some who use Fink or MacPorts and would rather avoid installing several package managers.

hehez commented 7 years ago

currectly, the login_button = sess.at_xpath('//*[@id="u_0_l"]'), but I think you could try q.form().submit() instead of finding the login button