instill-ai / instill-core

🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
https://www.instill.tech
Other
2.08k stars 92 forks source link

[INS-2214] [Feature] Web Crawler Operator #616

Closed praharshjain closed 2 months ago

praharshjain commented 1 year ago

Is There an Existing Issue for This?

Project

Instill VDP

Is your Proposal Related to a Problem?

No, it is a new feature request.

Describe Your Proposed Solution

We can implement a "Web Crawler" operator that will take an initial URL & a depth (int) as input and recursively extract links from those pages up to the given depth, finally returning a list of strings (extracted URLs).

Highlight the Benefits

Such an operator will be useful for crawling and gathering online data. For example, the links captured by it can then be fed to the text extraction operator to build a knowledge base from linked documents.

Anything Else?

No response

INS-2214

github-actions[bot] commented 1 year ago

This issue is a great way to kick-start your journey with our project, or to make a positive impact on open-source development. Jump in!

:sparkles: Thank you for your contribution! :sparkles:

AnkitaMalik22 commented 12 months ago

Can you please assign me this ?

itssiddhantjain commented 12 months ago

Hello @praharshjain, please assign this issue to me as i already worked on this kind of problem in past and has a great experience.

lazyMonk1010 commented 12 months ago

hey @praharshjain i want to work on this issue , as its my 1st work in ai so i really wanna work in this issue . thankyou!!please assign me

harshsoni7 commented 12 months ago

Can you please assign me this ?

Hi @AnkitaMalik22! Absolutely, we’re thrilled about your interest in our project! :rocket: Here’s the Contributing Guideline for Instill VDP to get you started on your journey! Please refer to the Contributing Guidelines for components as well. Don’t forget to link your pull request to the corresponding issue, and after your PR gets merged, please complete this form to claim your well-deserved points! If you ever have any questions or need a hand along the way, don’t hesitate to drop a message in this thread or hop into our Discord. Happy contributing! :blush::star2: