pytorn / hackr

A python library for hackathons.
Apache License 2.0
94 stars 31 forks source link

Extending the web scraper. #18

Open ba11b0y opened 6 years ago

ba11b0y commented 6 years ago

There hasn't been much work on the web scraping part. I am interested to work on this. Since this is going to be a generic one, what I have thought as of now includes: 1) A generic web scraper which scrapes all images, links and the text. 2) Use scrapy for this maybe.

Still a beginner, any tips or corrections?

shubhodeep9 commented 6 years ago

@invinciblycool I like the thought, I would suggest, a detailed list of missing components you find in the current code of scraper, then we will assign you the work.

ashwini0529 commented 6 years ago

@invinciblycool XML format could be added.

ba11b0y commented 6 years ago

@ashwini0529 I have added the XML response to web.py. Let me know if any corrections are needed @shubhodeep9 I will update the detailed list as soon as my exams get over :smile:

ba11b0y commented 6 years ago

@ashwini0529 @shubhodeep9 Couldn't resist the excitement :smile: These are some features in my mind which can be added :

2) Or creates dedicated directories for the above keys of the dictionaries and actually saves the content to the respective directory.(Inspired from httrack)

ashwini0529 commented 6 years ago

Hey @invinciblycool Sounds good. Sounds like a great idea to start with. Go ahead. We can add more features. 🎉

shubhodeep9 commented 6 years ago

@invinciblycool Add a TO-DO with your PR, and we will keep this issue alive until we feel satisfied. So that whenever someone gets a new idea on web-scraping, they can add to that TO-DO

ashwini0529 commented 6 years ago

Also, please add a [WIP] tag in your PR message. 😄

ba11b0y commented 6 years ago

@ashwini0529 To start working if you could make it clear that should the function be returning a response or should create folders and save the content locally. Thanks. @shubhodeep9 Just confirming a TO-DO with the PR or the issue.

ashwini0529 commented 6 years ago

Hey @invinciblycool you can take a look at the QR Code function. I think you can make something like that. Probable usage like what it was for QRCode: img = hackr.image.qrcode("https://github.com/pytorn/hackr", dest_path="/tmp/hackr_qrcode.png")

ba11b0y commented 6 years ago

I guess then we agree on saving all the content locally. Will start working on it ASAP.

ashwini0529 commented 6 years ago

Hey @invinciblycool Updates?

ba11b0y commented 6 years ago

Sorry for the delay, I will try opening a PR by this week. Happy Diwali BTW. :sparkles:

ashwini0529 commented 6 years ago

Perfect @invinciblycool Happy hacking and Happy Diwali! 😄 🎇