niklasb / dryscrape

[not actively maintained] A lightweight Python library that uses Webkit to enable easy scraping of dynamic, Javascript-heavy web pages
http://dryscrape.readthedocs.io/
MIT License
533 stars 67 forks source link

Both cookies and javascript enabled to login? #30

Closed ericmachine88 closed 9 years ago

ericmachine88 commented 9 years ago

Hi there,

I have problems logging into website which requires cookies and javascript enabled.

Check the screenshot below

http://i.imgur.com/dl5wFzl.jpg

Here's my code

import time
import dryscrape

username = '123456'
password = 'mypassword'

# setup a web scraping session
sess = dryscrape.Session(base_url = 'https://www.somewebsitedomain.com/Pages/Login/Login.aspx

# we don't need images
sess.set_attribute('auto_load_images', False)

# visit homepage and log in
print "Logging in..."
sess.visit('/')

username_field = sess.at_css("#txtCustomer")
password_field = sess.at_css("#passwd")
btnlogin_field = sess.at_css("#btnLogin")

username_field.set(username)
password_field.set(password)

# username_field.form().submit() is not working here
# btnlogin_field.form().submit() can't work either

btnlogin_field.click() # this works, but see the issue which requires cookies and javascript enabled

print "Taking snapshot"
sess.render('website.png')

Is there any way to make this worked? I have tested on python selenium firefox and it worked. BUt again I need to run this on server basis. Thus python selenium phantomjs will see the same issue here. I was thinking whether your solution could help.

I need to run this script from an ubuntu 14.04 LTS server (not desktop).

Any help? Thanks.

niklasb commented 9 years ago

Hi,

from your code alone it's not apparent what the problem is. dryscrape supports both Javascript and cookies. Maybe the website uses an unusal authentication mechanism such as locale storage which dryscrape does not support?

Also, you should probably wait for some UI change after logging in, otherwise you will just get a screenshot of the login form.

Best, Niklas

ericmachine88 commented 9 years ago

Hi Niklas,

Thanks for coming back.

What can I do to code the "waiting" part? Any tips?

or is it just time.sleep(5) ?

For database, I believe they are using SQL Server since it's on asp.net platform.

Any help? Thanks.

niklasb commented 9 years ago

I'm not talking about server-side database technology. I was referring to what client-side mechanisms the web page uses for authentication. Maybe it's not just cookies, maybe it's local storage or something else entirely.

As for waiting, you can use the methods defined in WaitMixin to wait for DOM elements.

ericmachine88 commented 9 years ago

Anyway I have figured out a way to make this work via selenium firefox. http://scraping.pro/use-headless-firefox-scraping-linux/

Thanks anyway :)