indrajithi / tiny-web-crawler

A simple and easy to use web crawler for Python
MIT License
55 stars 11 forks source link

Added option to include page body in crawl results #19

Closed Mews closed 1 week ago

Mews commented 1 week ago

Closes #8

Changes

Right now the body of the page is added regardless of wether it finds links inside it or not! This just felt like the most expectable behavior, but let me know if I should change it. Also there are no verbose prints, I didn't find it necessary but let me know if I should add some 👍

Mews commented 1 week ago

Oh right I guess this new "body" field doesn't match the type hint for crawl_result :P Should it be Dict[str, Dict[str, Union[List[str], str]]]?

Mews commented 1 week ago

@indrajithi I'm not quite sure how to do the type hints for the crawl_result variable now that it has this new body field :/

indrajithi commented 1 week ago

@indrajithi I'm not quite sure how to do the type hints for the crawl_result variable now that it has this new body field :/

self.crawl_result: Dict[str, Dict[str, Union[List[str], str]]] = {} Does this not work?

indrajithi commented 1 week ago

If the type hint for crawl_result is not working, we can just set it to a basic dict or override/suppress checking that case and move on.

Mews commented 1 week ago

Alright I'm on my phone right now but I'll get to it when I get home :+1:

Mews commented 1 week ago

@indrajithi I'm not quite sure how to do the type hints for the crawl_result variable now that it has this new body field :/

self.crawl_result: Dict[str, Dict[str, Union[List[str], str]]] = {} Does this not work?

Nope that's what raised the error on the ci. I'll open an issue about it so that it can be dealt with later.

Mews commented 1 week ago

@indrajithi Ok I just introduced a temporary fix, I set the type hint to Dict[str, Any], so you can rerun the ci and merge if everything passes. I'll open the issue now.