Closed BZBY closed 2 weeks ago
Hi @BZBY thank you so much for the suggestion, I really appreciate it. I've implemented the changes - see the code below. There's a new flag that allows the Chromium browser to run in control of the user's own browser, similar to your suggestion. It also accepts a specific user data directory, offering a lot of possibilities. For example, you can copy and paste part of your current user data into this folder and then pass it. I've tested this on a few websites that were previously challenging to crawl, and it's now easier.
This makes sense, as it's the user's own browser. Additionally, you can set the browser type, and the printing system automatically detects whether you're running on a Mac, Windows, or Linux. Since I'm on a Mac, I haven't tested on Windows yet - maybe you can try that.
These changes are in the 0.3.73 branch, and I'll push them to the new version soon. Thanks again for the suggestion. I wouldn't have thought of it. If you'd like, I'd love to invite you to our Discord to work with us and suggest more improvements for the engine.
import asyncio
from crawl4ai import AsyncWebCrawler
async def main():
async with AsyncWebCrawler(
headless=True, # Set to False to see what is happening
use_managed_browser=True,
browser_type="chromium",
) as crawler:
result = await crawler.arun(
url="https://crawl4ai.com",
bypass_cache=True,
)
print(result.markdown)
if __name__ == "__main__":
asyncio.run(main())
@unclecode Thanks for the update! I will try out the 0.3.73 branch and provide feedback. I'm also interested in joining the Discord!
@BZBY You are most welcome, and please share your email address then I send the invitation link, perhaps you can start to help by testing this new features.
@unclecode OK,I’m already testing the new features. my email: bzbyAi@protonmail.com
@BZBY already sent, welcome :)
Some websites have a CAPTCHA mechanism that is repeated, while Playwright already has a feature to take control of the user's own browser (e.g., by launching the Chrome browser through the command line in CMD terminal as follows:
Can
crawl4ai
support this? This would significantly reduce the need to write code to handle CAPTCHAs, and it would be great if this functionality could be implemented!