nodeconf / US-CFP

Call for participation for NodeConf 2015
15 stars 0 forks source link

Applications of web scraping with node.js #40

Closed frankcash closed 9 years ago

frankcash commented 9 years ago

I think this is an interesting skill to have. A lot of time some API's do things you don't want, behave in a poorly designed manner, or are totally non-existent. Web scraping may not always be the most efficient manner of consuming others' APIs but it is better than not having an API to work with!

Web scraping is easily achieved through the request and cheerio libraries.

I know I've used web scraping to create my own API for services before, I really enjoyed learning how to approach web scraping for this use case and feel that many others in the community would enjoy it too.

Web scraping can be used for more than making your own API out of another service you could use it to run metrics on another person's site to find things such as specific mentions of words or extract all the links and downloaded them (i.e. PDF's).

macalinao commented 9 years ago

:-1:

Tbh, this is a really simple thing that even beginners should already be able to do, as Cheerio is just jQuery after all. I think this talk won't have much depth as this is really extremely easy to do and is pretty much already common knowledge.

vishnuravi commented 9 years ago

:thumbsup: I think this would be a valuable talk for many people.

frankcash commented 9 years ago

@simplyianm I agree cheerio is a very friendly to use library since it mimics a lot of jquery features. I think overall writing an efficient web scraper isn't easy. Sure it's easy to get all the hrefs from a document but branching out from them can make it more complicated. Thus, if a talk on web scraping were to be presented I think it would be safe to assume that it would encapsulate more than just pulling down the Wikipedia page for node and totaling how many links are on it.

I think it's safe to assume that just making people aware of how to create an API out of another's service or collect data on the web would be of value, since often times it is easy to forget about that option.

chrisjs commented 9 years ago

:+1: while simple web scraping can be accomplished by anybody, i like this talk idea given the statement:

"Web scraping can be used for more than making your own API out of another service you could use it to run metrics on another person's site to find things such as specific mentions of words or extract all the links and downloaded them (i.e. PDF's)."

providing it shows a real world example of not just scraping but what can be done with that data

mikeal commented 9 years ago

As you may have seen we had to cancel the speaking event at the Fox Theatre.

You're welcome to join us at Walker Creek Ranch for NodeConf Adventure which is an un-conference with attendee driven worksshops and discussion sessions. If you'd like to adapt this topic or any other idea to that format and you're planning on attending just log an issue in the Adventure repo.