Closed indrajithi closed 1 week ago
Currently we do not return the html body from the crawled sites. We only returns the links we find.
['urls', 'body']
Eg:
{ "http://github.com": { "urls": [ "http://github.com/", "https://githubuniverse.com/", "..." ], "https://github.com/solutions/ci-cd": { "urls": [ "https://github.com/solutions/ci-cd/", "https://githubuniverse.com/", "..." ] } } }
This is a feature to return the html body as well. And the result should look look like this.
{ "http://github.com": { "urls": [ "http://github.com/", "https://githubuniverse.com/", "..." ] "body": "<html>stuff</html>", "https://github.com/solutions/ci-cd": { "urls": [ "https://github.com/solutions/ci-cd/", "https://githubuniverse.com/", "..." ], "body": "<html>other stuff</html>", } } }
i have solved this issue. Please check it out: https://github.com/indrajithi/tiny-web-crawler/pull/14#issue-2354558299
@devavinothm I think you meant this issue.
Can I be assigned this?
Currently we do not return the html body from the crawled sites. We only returns the links we find.
['urls', 'body']
Eg:
This is a feature to return the html body as well. And the result should look look like this.