unclecode / crawl4ai

🔥🕷️ Crawl4AI: Open-source LLM Friendly Web Crawler & Scrapper
Apache License 2.0
14.47k stars 1.01k forks source link

Can't get screenshot working #142

Closed pleomax0730 closed 2 weeks ago

pleomax0730 commented 2 weeks ago

Environment

System: Windows 11
Python version: 3.10.15
crawl4ai version: 0.3.5

Code to reproduce

from crawl4ai import AsyncWebCrawler
from crawl4ai.async_crawler_strategy import AsyncPlaywrightCrawlerStrategy
import base64
import asyncio

crawler_strategy = AsyncPlaywrightCrawlerStrategy(
    verbose=True,
    headless=True,
)

async def main():
    async with AsyncWebCrawler(verbose=True, crawler_strategy=crawler_strategy) as crawler:
        result = await crawler.arun(
            url="https://www.nbcnews.com/business", bypass_cache=True, screenshot=True
        )
        print(result.markdown)
        # Save the screenshot to a file
        with open("screenshot.png", "wb") as f:
            f.write(base64.b64decode(result.screenshot))

        print("Screenshot saved to 'screenshot.png'!")

if __name__ == "__main__":
    crawler = AsyncWebCrawler(verbose=True)
    asyncio.run(main())

Expected error

Traceback (most recent call last):
  File "C:\Users\User\Desktop\crawl_test\mycrawl.py", line 28, in <module>
    asyncio.run(main())
  File "C:\Users\User\miniconda3\envs\crawl\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "C:\Users\User\miniconda3\envs\crawl\lib\asyncio\base_events.py", line 649, in run_until_complete
    return future.result()
  File "C:\Users\User\Desktop\crawl_test\mycrawl.py", line 21, in main
    f.write(base64.b64decode(result.screenshot))
  File "C:\Users\User\miniconda3\envs\crawl\lib\base64.py", line 80, in b64decode
    s = _bytes_from_decode_data(s)
  File "C:\Users\User\miniconda3\envs\crawl\lib\base64.py", line 45, in _bytes_from_decode_data
    raise TypeError("argument should be a bytes-like object or ASCII "
TypeError: argument should be a bytes-like object or ASCII string, not 'NoneType'
pleomax0730 commented 2 weeks ago

After reviewing the source code, I noticed that the take_screenshot function is not being called when setting screenshot=True in arun.

However, it is possible to manually take a screenshot by calling screenshot = await crawler.crawler_strategy.take_screenshot(url=url)

As a feature request, could you add an option to include a wait time asyncio.sleep() after the await goto inside the take_screenshot function? Some websites have animations or other content that needs time to load, and without a delay, the screenshot may not capture the fully rendered page.

unclecode commented 2 weeks ago

Hi @pleomax0730, absolutely you are right. Such funny things we just missed. I updated the library and soon I will release version 0.3.6, and there it will definitely be implemented. Additionally, I added your suggestion for delay and I really appreciate it. You may also check the branch "0.3.6" if you are willing to give it a try. Thank you for supporting the library.