unclecode / crawl4ai

🔥🕷️ Crawl4AI: Crawl Smarter, Faster, Freely. For AI.
https://crawl4ai.com
Apache License 2.0
17.01k stars 1.26k forks source link

Created a new hook called "on_page_created" which allows the user to inspect raw HTTP requests/responses, and more #109

Open ifaddict1 opened 2 months ago

ifaddict1 commented 2 months ago

Dears, while using your library, I needed to access the raw content of requests being sent when a page is crawled (for instance, requests made to other APIs or files, etc.), the responses, and the associated headers. I realized that the easiest way to do this would be to leverage the existing Playwright hook on "Page" objects by using "Page.on()". Thus, I added a hook in the AsyncCrawlerStragegy class to achieve that.

Please let me know if this is actually useless and there is an easier/native way to do this with your library, but I haven't found any.

Cheers

oscarnevarezleal commented 2 weeks ago

I'd love to see this implemented. I'm also intending of reading http requests made by the page.