Closed gamorav closed 1 year ago
Hi, I have been working around this. Now, I am using the HAR file record.
The code:
from playwright.sync_api import sync_playwright
from pprintpp import pprint as pp
url = 'https://www.investing.com/commodities/crude-oil-historical-data'
def intercept_response(response):
# we can extract details from background requests
if response.request.resource_type == "xhr":
if "/api/financialdata/historical/" in response.url and response.status == 200:
pp(response.url)
return response
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
context = browser.new_context(record_har_path="example.har", record_har_url_filter="**/api/financialdata/historical/**")
page = context.new_page()
#page.on("response", intercept_response
page.goto(url)
page.wait_for_timeout(2000);
context.close()
browser.close()
When I use "headless=False" the response content is recorded well:
"content": {
"size": 9835,
"mimeType": "application/json",
"compression": 7312,
"text": "{\"data\":[{\"direction_color\":\"re......
But, "headless=True" doesn't record it:
"response": {
"status": -1,
"statusText": "",
"httpVersion": "HTTP/1.1",
"cookies": [],
"headers": [],
"content": {
"size": -1,
"mimeType": "x-unknown"
How Can I fix that?
Thanks!
The reason why its recorded in headed but not in headless is most likely because the site you are automating is having a bot protection to protect them from scrapers.
Coming back to your original question, how to get the request boy, you can do it like that:
route.request.post_data
if you just want to sniff the traffic, see here: https://playwright.dev/python/docs/network#network-events
and here for the methods: https://playwright.dev/python/docs/api/class-response
Closing as part of the triage process since it seemed stale. Please create a new issue with a detailed reproducible or feature request if you still face issues.
Thanks for the help and explanation.
Your question
Hi, I have this code:
My problem is that only I get the url, method and headers of this api request:
https://api.investing.com/api/financialdata/historical/...
But, I want to get this content:
{"data":[{"direction_color":"redFont","rowDate":....
How can I do that with playwright? Is it possible?
Thanks!