shaikhsajid1111 / twitter-scraper-selenium

Python's package to scrap Twitter's front-end easily
https://pypi.org/project/twitter-scraper-selenium
MIT License
308 stars 50 forks source link

why the very long wait in wait_until_completion( ) #70

Open LambertWM opened 1 year ago

LambertWM commented 1 year ago

I found that we spend 90% of the time in wait_until_completion( ), because the delay value time.sleep(randint(3, 5)) is 3 to 5 seconds, which seems very high - why is that?

time.sleep(random.uniform(0.1, 0.2)) seems more than enough for my simple tests, but maybe I'm missing something?

LambertWM commented 1 year ago

I spoke too soon - in order for the scrolling and fetching more tweets to work, a small delay has to be added to the scroll function as well:

def scroll_down(driver) -> None:
    """Helps to scroll down web page"""
    try:
        start = time.time()
        body = driver.find_element(By.CSS_SELECTOR, 'body')
        for _ in range(randint(2, 4)):
            body.send_keys(Keys.PAGE_DOWN)
            time.sleep(random.uniform(0.2, 0.3))
        print("scroll_down took " + str(time.time()-start));
    except Exception as ex:
        logger.exception("Error at scroll_down method {}".format(ex))

@staticmethod
def wait_until_completion(driver) -> None:
    """waits until the page have completed loading"""
    try:
        state = ""
        start = time.time()
        while state != "complete":
            time.sleep(random.uniform(0.1, 0.2))
            state = driver.execute_script("return document.readyState")
        print("wait_until_completion() took " + str(time.time()-start));
    except Exception as ex:
        logger.exception('Error at wait_until_completion: {}'.format(ex))
LambertWM commented 1 year ago

this leads me to believe that wait_until_completion( ) doesn't really do what it suggests. An alternative strategy, which has worked for me in the past, could be to send a PAGE_DOWN, then wait a little bit and to keep doing this as long until the document height has changed more than a certain amount (or a time out is reached).