Closed SKART1 closed 8 years ago
Are you sure that is not just an HTTP redirect? You can use a proxy such as burp suite or fiddler to check the exact response. On Feb 10, 2016 5:00 PM, "Art" notifications@github.com wrote:
I use dryscrape v1.0 as instrument to download stack traces from Google Play. I have downloaded whole crashes report for some period of time, and wanted to download page with stack traces.
And I have met a strange behavior - when url is:
It opens https://play.google.com/apps/publish/?dev_acc=18149679673077794436#AppListPlace instead of url above.
But at the same time if url is:
(differs in slash before question mark) - all works normal
Code is:
def downloadStackTraceByLink(link, session, i):
some black magic
#if link.find("publish/") == -1: # link = link.replace("publish", "publish/")
session.visit(link) # sleep a bit to leave the mail a chance to open. # This is ugly, it would be better to find something # on the resulting page that we can wait for time.sleep(10) if link != session.url(): print("WTF DUDE! Current link is: " + session.url() + "\n but was " + link) else: print("Ok " + str(i)) session.driver.render('screenshot ' + str(i) + '.jpg')
When login code is:
from dryscrape import dryscrape
class SessionGoogle: def init(self, url_login, login, passwd): self.ses = dryscrape.Session() self.ses.visit(url_login)
login = self.ses.at_xpath('//*[@id="Email"]').set(login) password = self.ses.at_xpath('//*[@id="Passwd"]').set(passwd) login_button = self.ses.at_xpath('//*[@id="signIn"]').click() self.ses.driver.render('login_result.png') def getSes(self): return self.ses
— Reply to this email directly or view it on GitHub https://github.com/niklasb/dryscrape/issues/47.
I will check and report, but putting only one slash differs all
Yes but that might that be due to what the web server does with your request. Doesn't look like a bug in dryscrape.
On Wed, Feb 10, 2016, 18:17 Art notifications@github.com wrote:
I will check and report, but putting only one slash differs all
— Reply to this email directly or view it on GitHub https://github.com/niklasb/dryscrape/issues/47#issuecomment-182488501.
I have used the the same url in "adult" browsers - firefox and chromium - all works, no redirect were detected (visually I stayed on desired page)...
May be this is not dryscrape
fault - but let me test with proxies
Yes, you were right - it is html refdirect:
<HTML>
<HEAD>
<TITLE>Moved Temporarily</TITLE>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000">
<H1>Moved Temporarily</H1>
The document has moved
<A HREF="https://play.google.com/apps/publish/?dev_acc=18149679673077794436">here</A>
.</BODY>
</HTML>
But very strange that after redirect I am receiving 403 forbidden error! I can send you proxy dumps if you are interested in this problem
If I go directly to redirected page - all is ok...
Maybe you are not logged in properly? The page you linked does not seem to be publicly accessible. Anyway, since this has nothing to do with dryscrape, I'm closing this issue.
No, I am logged because I can go just to the same url with one slash difference
I use
dryscrape
v1.0
as instrument to download stack traces from Google Play. I have downloaded whole crashes report for some period of time, and wanted to download page with stack traces.And I have met a strange behavior - when url is:
It opens
https://play.google.com/apps/publish/?dev_acc=18149679673077794436#AppListPlace
instead of url above.But at the same time if url is:
(differs in slash before question mark) - all works normal
Code is:
When login code is:
url_login = "https://accounts.google.com/ServiceLogin"