Scraping the full Yahoo Finance table (scrolling webpage)

katcorr commented 4 years ago

I've pushed this code to your team repo that works to scrape all the data for "MMM" company.

It's kind of neat/freaky to watch the web browser be controlled automatically. You may need to update your version of Chrome if you don't have the latest version. Then:

After running the rsDriver(...) command, a browser pops up:
After identifying the website with the navigate command, that browser automatically goes to the URL specified. Note the "Chrome is being controlled by automated test software" comment in the browser!
If you run the for loop, then watch the browser, it will automatically scroll down by itself (i.e. that's what the for loop command is making it do!)
MMM_data should have 258 observations, the last being Oct. 7, 2019.

Test this out and see if this single webpage test case works on your computers. Then, if needed, I can help incorporate it into your for loop to loop through all the web pages.

katcorr commented 4 years ago

If you run into an error: "Selenium server signals port = 4837 is already in use." try changing the port number in the rsDriver command, e.g. rD <- rsDriver(port = 4000L, browser = "chrome")

katcorr commented 4 years ago

@zostrow2001 @luwilliam20

I know you mentioned you may just use what's initially scraped (without scrolling), but just in case you have time to go back and add this additional data . . . We were receiving errors related to the chrome browser before. Turns out, there's an argument where you can specify which chrome browser version you're using:

# Make sure your driver version matches the version of chrome you have installed
rD <- rsDriver(browser="chrome",chromever = "85.0.4183.83")

Update the chromever number to what version you have -- that should fix the issue!

stat231-f20 / Blog-MoneyMovers

Scraping the full Yahoo Finance table (scrolling webpage) #2