hannesdatta / course-odcm

This repository hosts the course website of Tilburg University's open education class on "Online Data Collection and Management" (oDCM) - learn how to collect web data for your empirical research projects!
https://odcm.hannesdatta.com
13 stars 24 forks source link

add to web data advanced #50

Closed hannesdatta closed 2 years ago

hannesdatta commented 3 years ago

Web data advanced currently shows how to scroll through the entire page.

Yet, it may be super useful for students to learn how to only scroll "once", or "twice", or a little bit.

Please add a little section to the tutorial where this is done (inspiration can be find in students' project submissions).

RalphGit21 commented 2 years ago

From Stackoverflow, should work for twitter as well:

SCROLL_PAUSE_TIME = 0.5

- Get scroll height

last_height = driver.execute_script("return document.body.scrollHeight")

while True:

- Scroll down to bottom

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

#- Wait to load page
time.sleep(SCROLL_PAUSE_TIME)

#- Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
    break
last_height = new_height
hannesdatta commented 2 years ago

Try out & add to tutorial?

hannesdatta commented 2 years ago

Try out on a different website and then add to tutorial.

RalphGit21 commented 2 years ago

Tested in the tutorial and works - Not sure how to implement this in google colab (may be due to access rights)

hannesdatta commented 2 years ago

please still add to the tutorial & commit