rubyaustralia / melbourne-ruby

Organisers notes and processes
54 stars 5 forks source link

Web Scraping: How not to do it #98

Closed Lewington-pitsos closed 6 years ago

Lewington-pitsos commented 6 years ago

Who even likes API's anyway?

They're almost always outdated and poorly maintained, and the Docs are only ever borderline legible if that. Most importantly though: they never let you do anything cool. It's all rate-limits and access restrictions. Phooey I say! If you really want to get your paws on all that sweet sweet data you've been eyeing down there's no better tool than the universal API: HTTP.

This presentation covers what I learned over the last few months of trying and perpetually failing to git good at web scraping with Ruby. I'll be honest, I still have no idea what I'm doing, but if you're new to web scraping then hey, I might just save you a few months.

Talking points

  1. Basics: how to Scrape with Net:HTTP
  2. Capybara: how not to scrape (and the miracle of logging)
  3. Headless vs Headful browsers: when and why
  4. Storing Data: And why it's incredibly important
  5. The Last Resort (or How I Learned to Stop Worrying and Love the Constant Errors)

Overall I see this going for about 25 intense minutes. If you think this could be fun, let me know. I would ideally like to do this talk at the march meetup (while everything's still fresh) if you have space, but whenever is cool really.

Lewington-pitsos commented 6 years ago

I have a rough draught of the presentation up and running Here, but I was wondering: does anybody know when and/or how do we find out if and when we'll be presenting?

Cheers.

tcn33 commented 6 years ago

@Lewington-pitsos how about April 25th? (Though I've just worked out that's Anzac Day so we're working out what that means for scheduling.)

Lewington-pitsos commented 6 years ago

April 25th (or like days around then) sounds great, plenty of time to test everything properly.

tcn33 commented 6 years ago

@Lewington-pitsos we've moved to April 24. Still good?

Lewington-pitsos commented 6 years ago

Yes most certainly is, cheers for the update.

Lewington-pitsos commented 6 years ago

Hey, just letting you know that ill be running a tad late, but ill still be there in time to present! Cheers.

Lewington-pitsos commented 6 years ago

Hey ho, just confirming, is it still ok for me to do the presentation at next month's meetup? This time I'll test it on the laptop beforehand haha.

tcn33 commented 6 years ago

Hey! Double-checking that you're good to go for this month?

tcn33 commented 6 years ago

@lewington-pitsos ping!

Lewington-pitsos commented 6 years ago

Shucks, sorry I swear I checked this like 3 days ago. Yes, we are ago on the presentation. AND it;s working on my laptop so unless there's no internet I'm good to go.

tcn33 commented 6 years ago

Thanks for the talk Louka!