marty90 / PyChromeDevTools

PyChromeDevTools is a python module that allows one to interact with Google Chrome using Chrome DevTools Protocol within a Python script.
Apache License 2.0
301 stars 41 forks source link

How to get the HTML source code? Thanks #15

Open TimeAshore opened 5 years ago

marty90 commented 5 years ago

This is not related to this tool. You should see the documentation of the Chrome DevTools Protocol. However, you could get the body of a HTTP response using: Network.getResponseBody. You must specify the RequestId to this function. To get a RequestId, use: Network.responseReceived.

Another way is to get the DOM of the page, which is the HTML loaded by the browser, eventually modified by scripts: see the DOM page. Use for example:

chrome.DOM.getDocument(depth=-1)

To get the whole DOM in json format.

TimeAshore commented 5 years ago

Example:

import PyChromeDevTools

chrome = PyChromeDevTools.ChromeInterface()
chrome.Network.enable()
chrome.Page.enable()

chrome.Page.navigate(url="http://www.jianshu.com")
event, messages = chrome.wait_event("Page.frameStoppedLoading", timeout=60)
value = chrome.wait_event("Network.responseReceived", timeout=60)
print(value)

reqid = value[0]['params']['requestId']
print("reqid: ", reqid)
print(chrome.Network.getResponseBody(reqid))

But tip no parameters:

({'method': 'Network.responseReceived', 'params': {'requestId': '1000029019.143', 'loaderId': ......
reqid: 1000029019.143
Traceback (most recent call last):
  File "/home/ldy/workspace/chromedevtols/demo.py", line 14, in <module>
    print(chrome.Network.getResponseBody(reqid))
TypeError: generic_function() takes 0 positional arguments but 1 was given

And when I use chrome. DOM. GetDocument (the depth = 1), HTML is divided, and how to get the original HTML? like this:

<html>
  <head>
  ...
  </head>
  <body>
  ...
  </body>
</html>

Thanks!

marty90 commented 5 years ago

Network.getResponseBody(reqid) is wrong. reqid must not be passed as positional argument but as requestId. E.g.

Network.getResponseBody(requestId=reqid)
TimeAshore commented 5 years ago

I got it, thank you very much.