indrajithi / tiny-web-crawler

A simple and easy to use web crawler for Python
MIT License
55 stars 11 forks source link

Fix `crawl_result` type hint #21

Open Mews opened 1 week ago

Mews commented 1 week ago

Because of #19 , the type hint for Spider.crawl_result broke, and it was temporarily replaced with Dict[str, Dict[str, Any]]. This should be fixed to actually reflect the contents of crawl_result, which has the following format:

crawl_result = {
    "url1":{
        "urls":["some url", "some other url", ...],
        "body": "the html of the page"
    },
    "url2":{
        "urls":["some url", "some other url", ...],
        "body": "the html of the page"
    },
}

Where body is only present if the include_body argument is set to True, and as such might not always be present. See #19 for previous discussions about this. You can verify the type hint is working if the mypy checks pass.

Mews commented 1 week ago

@indrajithi I think this might be a good first issue due to how well documented it is.