instill-ai / instill-core

🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
https://www.instill.tech
Other
2.16k stars 107 forks source link

[Web] Setup header for web scrape task #1109

Open ShihChun-H opened 2 months ago

ShihChun-H commented 2 months ago

Issue Description

Current State

Why We Want to Change?

Proposed Change


Rules for the Component Hackathon


Component Contribution Guideline | Documentation | Official Go Tutorial

linear[bot] commented 2 months ago

INS-6353 [Web] Setup header for web scrape task

someshfengde commented 1 month ago

Hi @ShihChun-H can you please assign this issue to me?

ShihChun-H commented 1 month ago

Hi @someshfengde , sure. The issue has been assigned to you.

someshfengde commented 1 month ago

thank you will start

someshfengde commented 1 month ago

Hi @ShihChun-H can you please help me get started for working on this issue. I've been trying to set up my machine according to contributions.md but it's not working out (it's been 50 + mins since pulling images from docker)

Also can you explain in more detail what I've to do?

from description mentioned I think I have to add headers to schema/ai-tasks.json lmk if I'm on right path.

I've been thinking to add this

    "headers": {
      "title": "Request Headers",
      "description": "HTTP headers to include in the request.",
      "type": "object",
      "additionalProperties": {
        "type": "string"
      }
    }

Thanks :)

chuang8511 commented 1 month ago

Hi @someshfengde , Thanks for taking time on this.

can you please help me get started for working on this issue. I've been trying to set up my machine according to contributions.md but it's not working out (it's been 50 + mins since pulling images from docker)

It could be the several reasons. From my experience, you may need to increase your docker resources. In the Docker Desktop, you can find them here. Could you please try it out again? image

Or, sometimes restarting your PC / cleaning your docker resources could help as well.

Also can you explain in more detail what I've to do?

You have to add more params in web operator's tasks.json. To scrape some websites requiring more information, scraper needs to set up the header to access the website. So, you can add more optional params in scrapers. And, the users can set up some tokens or key when they scrape specific sites.

I hope I answer your all questions. Please feel free to ask me anything if there is further question! Thank you again!

ShihChun-H commented 1 month ago

Hi @someshfengde, I'm following up to check on any progress made or any question encountered regarding this issue. Could you please provide an update? Thanks 🙏

someshfengde commented 1 month ago

sorry I have been busy for last couple of days. Will continue to work on it after some hours

kuroxx commented 1 month ago

Hey @someshfengde how's it going? If you have any PR for this, don't forget to submit it!

someshfengde commented 1 month ago

sorry got into other tasks I think it'll require me lots of efforts can you please assign this to someone else?

kuroxx commented 1 month ago

@someshfengde No worries - thank you for letting me know! Good luck with your other tasks

Sourabh782 commented 1 month ago

hii @kuroxx , if you dont mind can i look into this issue?

kuroxx commented 1 month ago

Hey @Sourabh782, sounds great! I have assigned it to you 🤝

kuroxx commented 1 month ago

Hey @Sourabh782 how's it going? Any blockers or progress?

If you have questions or need help, we have Discord community here: https://discord.gg/sevxWsqpGh

kuroxx commented 3 weeks ago

Hey @Sourabh782, not sure if you're still working on this but since it's been 2 weeks now - I will unassign this task. Thanks