steven-king / mj583

J583 Advanced Interactive Media
15 stars 11 forks source link

Problems with Scraping #6

Open steven-king opened 6 years ago

steven-king commented 6 years ago

Please explain your problems and link to your repo of your iPython Notebook as a comment below.

cboliek commented 6 years ago

https://github.com/cboliek/go_heels/blob/master/go_heels%20(3)%20(1).ipynb

peytonchance commented 6 years ago

Peter solved my problem, I was targeting p to create JSON when I could have backed up a couple steps and just targeted th, tr and td with Scrapy.

elisabeth-parker commented 6 years ago

One problem that I was running into that had me stuck for a LONG time was that one of my players (on the women's volleyball team) didn't even have a stats page, and one of the Scrapy methods was failing because it was receiving a null value instead of a string. I added a validation step to ensure my html getting passed to the Selector method was a string. My code that finally finally all worked is here: https://github.com/elisabeth-parker/goHeels-scraping in the file called "GoHeels scrape EP (2).ipynb".

steven-king commented 6 years ago

@cboliek

Your selector is empty.

html = json.loads(stats_sel.content.decode("utf-8"))["current_stats"]

#Remove this line:
stats_sel = scrapy.Selector(text=html)

#Change this line to refrence html not stats_sel:
# player_stats = stats_sel.css('.sidearm-table').xpath('string()').extract()
player_stats = html.css('.sidearm-table').xpath('string()').extract()
steven-king commented 6 years ago

@elisabeth-parker That works. You can do a try statement to check for stats.

aryaswanie commented 6 years ago

I found it hard to follow up with the codes from the example, so I tried to write my own "simpler" code. I basically wrote methods to pull out data and then called the methods in a dictionary. However, I got my columns and data by separate methods and I can't find a way to zip them together. Is there a way to do that in this point or should I just change me codes completely? https://github.com/aryaswanie/data/blob/master/Women's%20Volleyball.ipynb https://github.com/aryaswanie/data/blob/master/scraped_players.json