Closed kksasa closed 1 week ago
I found the solution by add my login code into
async_crawler_strategy.py --> async def crawl(self, url: str, **kwargs) -> AsyncCrawlResponse:
after page = await context.new_page()
await page.goto(url)
await page.fill('input[name="USER"]', 'nnn')
await page.fill('input[name="PASSWORD"]', 'xx')
await page.click('input[type="submit"]')
title = await page.title()
print(f"PAGE title: {title}")
print("login pass")
@kksasa Thx for using Crawl4ai. Here's a condensed version of the message:
I believe you're using our library with Playwright in an unintended way. Our library is designed to simplify tasks, and you can achieve most of what you're doing without Playwright. We have features like webhooks, page access, and JavaScript execution that can help.
For example, you can use our "Managed Browsers" feature (coming in a new version) to create a browser session with a pre-logged-in user. Alternatively, you can use our hooks to run JavaScript before crawling, such as filling out a login form and waiting for elements to load.
I'd be happy to provide a simple example of how to use our library effectively. I will add this to my backlog, to create a demo for this and the content, perhaps, and explain how you can do that. Please stay tuned.
can we pass through cookies to the crawl rest API endpoint so it can be logged in?
@rdvo are you referring to when you are using Crawl4ai from the running Doctor server? Is that what you mean by the 'Rest API endpoint'? If yes, the answer is yeah, you can pass the cookies definitely. For example you can pass like this:
request = {
"urls": "https://www.nbcnews.com/business",
"priority": 8,
"js_code": [
"const loadMoreButton = Array.from(document.querySelectorAll('button')).find(button => button.textContent.includes('Load More')); loadMoreButton && loadMoreButton.click();"
],
"wait_for": "article.tease-card:nth-child(10)",
"crawler_params": {
"headless": True, "cookies": [{...}, ..]
}
}
yes what are the params there, is it in json format like edithtiscookie chrome addon? what format do we pass them in?
it gets passed as crawler_params options?
Thanks!
Hello,
I don`t find right way to work such case,as the web I need crawl will jump to login page and later after login will turn to content page.
Here is my test code, but not work. Would you give me one hand?