Open nverwer opened 10 years ago
Hi nverwer,
Since there are a lot of classes 'value', How did you know that using that code would get the right price? because the [0]?
I found this solution:
scraping = BeautifulSoup(page)
assets = scraping.find_all("div", "asset-inner", limit=1)
ask_asset = BeautifulSoup(str(assets[0]))
price_value = ask_asset.find_all("dd", "value")[0].get_text()
return price_value
But, of course yours is more accurate.
Hi AngelAlvarado,
It is indeed because of the [0]. This is certain to break again in the future, but since there are no 'id'-attributes on the webpage any more (not when I looked at it anyway), it was the best solution I could come up with. I think your solution is also good, but of course web-scraping is a dangerous way to get information. The Bad Data Handbook (published by O'Reilly) has an interesting chapter on this.
Gotcha,
Thanks for letting me know.
Definitely a dangerous way. After finishing this book, I'll take a look to the Bad Data Handbook. Thanks for the recommendation.
The structure of the HTML on gold.org has changed. This illustrates the danger of webpage screping, but it also breaks the example given in WebScraping.py. In order to make things difficult, there are no 'id' attributes on the HTML elements with the prices now.
The result is an error:
IndexError: list index out of range
.Changing the line where price is determined to:
seems to work.
It might be useful to add that the output file is buffered, so it will take some time before something appears in it.