roadlabs / cefpython

Automatically exported from code.google.com/p/cefpython
0 stars 0 forks source link

wxpython-response.py example: final url is wrong when doing redirects #173

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Hello,

I need to get the status code of the initial url loaded in the browser, and 
also the final url the initial one might have redirected to. So for instance if 
I load http://httpbin.org/redirect-to?url=/status/418 I expect to get status 
code = 302 and final url = http://httpbin.org/status/418. 

This doesn't work on the wxpython-response example though. Only one 
ResouceHandler is created (and all other objects in the chain) and the status 
code associated with its response is 418. Also if at any point after OnLoadEnd 
I do browser.GetUrl() or even run javascript to get window.location.href, I get 
the initial url. I thought it had something to do with the unused 
redirectUrlOut parameter to ResourceHandler.GetResponseHeaders but the only 
time this method is called the headers I get are those of the final url, never 
the initial ones which contain the Location key.

To reproduce, simply change the navigateUrl parameter in line 362 of 
wxpython-response.py to "http://httpbin.org/redirect-to?url=/status/418". 
Opening the initial url on a regular chrome browser yields the expected 
behavior.

Original issue reported on code.google.com by luiz.ge...@sieve.com.br on 16 Feb 2015 at 2:13

GoogleCodeExporter commented 8 years ago
Can you provide some debug logs of what exactly is happening? Which callbacks 
are being called and in which order? Please copy all callbacks from 
LoadHandler/RequestHandler from the wxpython.py example and add them in the 
wxpython-response.py example. This will provide detailed logs.

Original comment by czarek.t...@gmail.com on 16 Feb 2015 at 2:49

GoogleCodeExporter commented 8 years ago
There is a callback named OnResourceRedirect in RequestHandler. It would be a 
good idea to check what is its relation to resource handler, if any.

Original comment by czarek.t...@gmail.com on 16 Feb 2015 at 2:51

GoogleCodeExporter commented 8 years ago
I attached the wxpython-response.py script with the changes you requested. This 
is its log output:

{{{
[0216/135809:INFO:gpu_info_collector_x11.cc(80)] NVCtrl extension does not 
exist.
wx.version=2.8.12.1 (gtk2-unicode)
[wxpython.py] RequestHandler::OnBeforeBrowse()
    url = http://httpbin.org/redirect-to?url=/status/418
[wxpython.py] RequestHandler::OnBeforeResourceLoad()
    url = http://httpbin.org/redirect-to?url=/status/418
GetResourceHandler(): url = http://httpbin.org/redirect-to?url=/status/418
[wxpython.py] RequestHandler::GetCookieManager(): created cookie manager
ProcessRequest()
OnRequestComplete()
status = Unknown
error code = 0
_OnResourceResponse()
data length = 135
GetResponseHeaders()
headers: 
{'Content-Length': '135', 'x-more-info': 'http://tools.ietf.org/html/rfc2324', 
'Server': 'nginx', 'Connection': 'keep-alive', 
'Access-Control-Allow-Credentials': 'true', 'Date': 'Mon, 16 Feb 2015 15:58:10 
GMT', 'Access-Control-Allow-Origin': '*'}
[wxpython.py] LoadHandler::OnLoadStart()
    frame url = http://httpbin.org/redirect-to?url=/status/418
[wxpython.py] LoadHandler::OnLoadingStateChange()
    isLoading = False, canGoBack = False, canGoForward = False
[wxpython.py] LoadHandler::OnLoadEnd()
    frame url = http://httpbin.org/redirect-to?url=/status/418
    http status code = 418
}}}

OnResourceRedirect doesn't seem to be getting called.

Original comment by luiz.ge...@sieve.com.br on 16 Feb 2015 at 4:04

Attachments:

GoogleCodeExporter commented 8 years ago
This is what the wxpython.py example script outputs when I only change its 
navigateUrl parameter. This is the desired behavior, so I'll try to see what 
difference between the two script is causing the issue. I'm new to these 
interfaces though so I appreciate any help from someone more experienced.

[wxpython.py] wx.version=2.8.12.1 (gtk2-unicode)
[CEF Python] Initialize() called
[CEF Python] CefExecuteProcess(): exitCode = -1
[CEF Python] CefInitialize()
[CEF Python] App_OnBeforeCommandLineProcessing_BrowserProcess()
[CEF Python] Command line string for the browser process:  
--browser-subprocess-path=/usr/lib/pymodules/python2.7/cefpython3/subprocess 
--lang=en-US --log-file=/tmp/debug.log --log-severity=info 
--enable-release-dcheck 
--resources-dir-path=/usr/lib/pymodules/python2.7/cefpython3 
--locales-dir-path=/usr/lib/pymodules/python2.7/cefpython3/locales 
--remote-debugging-port=52766 --no-sandbox
[CEF Python] BrowserProcessHandler_OnBeforeChildProcessLaunch()
[0216/140943:INFO:gpu_info_collector_x11.cc(80)] NVCtrl extension does not 
exist.
[CEF Python] WindowUtils::IsWindowHandle() not implemented (always True)
[CEF Python] CreateBrowserSync() called
[CEF Python] navigateUrl: http://httpbin.org/redirect-to?url=/status/418
[CEF Python] CefBrowser::CreateBrowserSync()
[CEF Python] GetPyBrowser(): creating new PyBrowser, browserId=1
[wxpython.py] LifespanHandler::_OnAfterCreated()
    browserId=1
[CEF Python] BrowserProcessHandler_OnBeforeChildProcessLaunch()
[CEF Python] CefBrowser::CreateBrowserSync() succeeded
[CEF Python] SendProcessMessage(): message=DoJavascriptBindings, arguments 
size=1
[CEF Python] Command line string for the zygote process: 
/usr/lib/pymodules/python2.7/cefpython3/subprocess --type=zygote --no-sandbox 
--enable-release-dcheck --lang=en-US 
--locales-dir-path=/usr/lib/pymodules/python2.7/cefpython3/locales 
--log-file=/tmp/debug.log --log-severity=info 
--resources-dir-path=/usr/lib/pymodules/python2.7/cefpython3
[CEF Python] Renderer: OnProcessMessageReceived(): DoJavascriptBindings
[wxpython.py] RequestHandler::OnBeforeBrowse()
    url = http://httpbin.org/redirect-to?url=/status/418
[wxpython.py] RequestHandler::OnBeforeResourceLoad()
    url = http://httpbin.org/redirect-to?url=/status/418
[wxpython.py] RequestHandler::GetCookieManager(): created cookie manager
[CEF Python] Renderer: OnContextCreated()
[CEF Python] Renderer: DoJavascriptBindingsForFrame(): bindings are set
[CEF Python] Browser: OnProcessMessageReceived(): OnContextCreated
[CEF Python] V8ContextHandler_OnContextCreated()
[CEF Python] Renderer: DoJavascriptBindingsForFrame(): bindings are set
[wxpython.py] RequestHandler::OnResourceRedirect()
    old url = http://httpbin.org/redirect-to?url=/status/418
    new url = http://httpbin.org/status/418
[wxpython.py] RequestHandler::OnBeforeBrowse()
    url = http://httpbin.org/redirect-to?url=/status/418
[wxpython.py] RequestHandler::OnBeforeResourceLoad()
    url = http://httpbin.org/status/418
[CEF Python] BrowserProcessHandler_OnBeforeChildProcessLaunch()
[wxpython.py] LoadHandler::OnLoadStart()
    frame url = http://httpbin.org/status/418
[wxpython.py] DisplayHandler::OnAddressChange()
    url = http://httpbin.org/status/418
[wxpython.py] LoadHandler::OnLoadingStateChange()
    isLoading = False, canGoBack = False, canGoForward = False
[wxpython.py] DisplayHandler::OnTitleChange()
    title = httpbin.org/status/418
[wxpython.py] LoadHandler::OnLoadEnd()
    frame url = http://httpbin.org/status/418
    http status code = 418

Original comment by luiz.ge...@sieve.com.br on 16 Feb 2015 at 4:15

GoogleCodeExporter commented 8 years ago
So as I understand the redirect takes place and works fine. You are seeing the 
same content in browser as in google chrome. The only problem is that the wrong 
url being presented at the end of the request? Why is that a problem in your 
case?

What are the values of webRequest.GetRequestStatus() and webRequest.GetUrl() in 
OnRequestComplete?

Looks to me we need to set |redirectUrlOut| in GetResponseHeaders(). But I 
don't see a way to do that when using CEFUrlRequest (named WebRequest in 
cefpython).

Found this issue in CEF, which is to provide a way to detect redirects when 
using CEFUrlRequest:
https://code.google.com/p/chromiumembedded/issues/detail?id=1329
(and a corresponding CEF topic: 
http://www.magpcss.org/ceforum/viewtopic.php?f=6&t=11899)

It is not required to use WebRequest (CEFUrlRequest) to do a request. It is 
just a helper API exposed by CEF. You could as well use Python's urllib.request 
or similar to perform requests (but you might need to take care of user-agent 
and other headears that simulate Google Chrome behavior). Just noting this in 
case you don't want to wait for a fix in upstream CEF. In such case handle 
redirects by setting redirectUrlOut in GetResponseHeaders.

Original comment by czarek.t...@gmail.com on 16 Feb 2015 at 5:07

GoogleCodeExporter commented 8 years ago
Yes, that's a problem for me because I'm doing web scraping and I need to know 
the final url. Also I can't make another request using urllib because it's not 
uncommon for websites to do redirects using javascript or meta tags.

Thanks for investigating the issue. Fortunately while going over the 
differences between the two scripts I found that OnLoadEnd gives me the status 
code without needing to go through the whole ResouceHandler/WebRequest flow, 
which is much simpler and doesn't break the final location in the browser.

Original comment by luiz.ge...@sieve.com.br on 16 Feb 2015 at 6:21

GoogleCodeExporter commented 8 years ago

Original comment by czarek.t...@gmail.com on 16 Feb 2015 at 6:30