Add script to scrape information from web pages

kaiiyer / webtech

Identify the technologies used on websites. (Dig-deep into web tech from your terminal)

GNU Lesser General Public License v3.0

48 stars 39 forks source link

Add script to scrape information from web pages #19

Closed kaiiyer closed 4 years ago

kaiiyer commented 4 years ago

Use beautiful soup (bs4) to make a script so that the user can scrape information from web pages.

tinaoberoi commented 4 years ago

I would like to work on this.

kaiiyer commented 4 years ago

Okay !! I've assigned the issue to you @tinaoberoi

nishit130 commented 4 years ago

Hello, @tinaoberoi are you still working on this issue?

tinaoberoi commented 4 years ago

Yes @nis130

sanjanaagrawal commented 4 years ago

Please assign this issue to me.

kaiiyer commented 4 years ago

Someone is working on it. Please wait for a few days @sanjanaagrawal

aayush1205 commented 4 years ago

Could you please elaborate on what kind of information scraping are we looking at?

kaiiyer commented 4 years ago

I guess this might help https://medium.com/@heavenraiza/web-scraping-with-python-170145fd90d3

aayush1205 commented 4 years ago

Is this still active?

utkarsh-raj commented 4 years ago

I am assigning @aayush1205 . Please understand that in fairness of the ongoing contest, we have to limit the time assigned to medium issues to three days.

puneethkanna commented 4 years ago

Could you please elaborate on what kind of data to scrape,

Headings
Images etc..

kaiiyer commented 4 years ago

Headings will do !

aayush1205 commented 4 years ago

@kaiiyer I'm still working on the issue, sir.

aayush1205 commented 4 years ago

Also, @kaiiyer the script is really open ended in a sense that the user might give any website to scrape right? For instance, scraping YouTube is very different than let's say scraping Wikipedia. Henceforth, I'm leaving the script open for feature addition and right now am tackling the scraping of general info like links etc for any given website. Sounds allright?

kaiiyer commented 4 years ago

Yeah cool

aayush1205 commented 4 years ago

If I have cloned your repository, what command do I run (in terms of python somefile.py -u google.com) to simulate the symlink command of webtech -u google.com @kaiiyer

kaiiyer commented 4 years ago

Why do you wanna simulate it in first place ? webtech as a whole is a library you can't use a single python file to do the same job

kaiiyer commented 4 years ago

@utkarsh-raj will help you out with your queries @aayush1205

aayush1205 commented 4 years ago

@kaiiyer agreeably. But then, if I make changes to some files, for reference take a look at #46 , how do I test them. @utkarsh-raj

aayush1205 commented 4 years ago

@kaiiyer, now can this be closed?

kaiiyer commented 4 years ago

Yeah

aayush1205 commented 4 years ago

@kaiiyer, Thank you so much. Also, for some reason, my score is not updating.

kaiiyer commented 4 years ago

@utkarsh-raj can help you with that !

aayush1205 commented 4 years ago

@kaiiyer, @utkarsh-raj The score for this issue was not added to my profile. Please see to it.

koderjoker commented 4 years ago

@aayush1205 I've updated your score

aayush1205 commented 4 years ago

@koderjoker , Thanks a lot. I also wanted to ask if more issues will be added for us to solve.

koderjoker commented 4 years ago

You'll have to refer to @kaiiyer regarding the roadmap of the project.