unclecode / crawl4ai

🔥🕷️ Crawl4AI: Open-source LLM Friendly Web Crawler & Scrapper
Apache License 2.0
16.63k stars 1.23k forks source link

This code file: crawl4ai/web_crawler.py needs minor changes #133

Closed vignesh1507 closed 1 month ago

vignesh1507 commented 1 month ago
  1. Redundant kwargs in fetch_pages: The kwargs being passed in executor.map seem redundant, as they are being unpacked in the same format for every call. You can simplify this by passing **kwargs directly to the fetch_page_wrapper.

  2. Potential for None Values in process_html: When calling process_html, if html is None (for instance, if the crawl fails), you may run into issues. Ensure that html is valid before passing it to process_html.

  3. Missing import json: You use json.dumps in your code but haven't imported the json module. Make sure to add this import at the top:

import json

vignesh1507 commented 1 month ago

134 fixed the code.

unclecode commented 1 month ago

@vignesh1507 Thank you so much for your constructive points. Some of them have actually been resolved in the new versions that we're releasing soon (0.3.6(. Due to our move to async, we haven't done much on our synchronous version, and perhaps we will stop maintaining this part as well. Most of your comments will reside under that part. Anyway, we appreciate your input. Thank you so much. We will consider your suggestions in our future releasing versions. And thank you for your interest in our library.