indrajithi / tiny-web-crawler

A simple and easy to use web crawler for Python
MIT License
55 stars 11 forks source link

Feature: Support for crawling dynamic javascript heavy site #10

Open indrajithi opened 2 weeks ago

indrajithi commented 2 weeks ago

Description:

Enhance the existing web crawler to support crawling and extracting content from websites that rely heavily on JavaScript for rendering their content. This feature will involve integrating a headless browser to accurately render and interact with such pages.

Objectives:

Design Considerations:

indrajithi commented 1 week ago

Blocked by #17

Mews commented 1 week ago

What does blocked mean? I'd like to work on this but do you think I shouldn't?

indrajithi commented 1 week ago

@Mews You can work on this. I want to complete #17 before picking this up. I have merged the MR for that. Although there are a few more things to be done for #17. I believe this issue can be unblocked.

Since this is going to be relatively bigger story. Let us first discuss the approach and spec out the requirements and acceptance criteria.

Mews commented 1 week ago

Alright makes sense I'll work on the v1 milestone too then. I'll pick this up when v1 gets released 👍

indrajithi commented 1 week ago

You can create Issues for things in #24 you find interesting and pick it up. Meanwhile I will spec out some details in this Issue. Also I think we should have this in v1.