xenova / chat-downloader

A simple tool used to retrieve chat messages from livestreams, videos, clips and past broadcasts. No authentication needed!
https://chat-downloader.readthedocs.io/
MIT License
948 stars 132 forks source link

[QUESTION] How frequent are the requests done? #98

Closed BArdelean closed 3 years ago

BArdelean commented 3 years ago

I'm new to this scraping thing and I have a few curiosities. How many or how fast does the app poll the live_chat page for messages?

I'm wondering because I want to use the app for reading messages and creating a poll, but seeing as live_chat is in the robots.txt file how safe would this be for like a 12 hour stream?

KR, Bodo

xenova commented 3 years ago

How many or how fast does the app poll the live_chat page for messages?

The program works as follows (Assuming you are referring to YouTube).

  1. The webpage is loaded and the initial information is retrieved. This is the same as simply visiting the watch page (e.g. https://www.youtube.com/watch?v=5qap5aO4i9A) and clicking 'View Source'

  2. To avoid having to continuously reload the webpage, HTTP requests are simulated by specifying "continuation" information to the https://www.youtube.com/youtubei/v1/live_chat/get_live_chat endpoint. The continuation information means we only retrieve the next data (i.e., messages we have not received before)

  3. In the response from this API, YouTube specifies a recommended 'timeout' value, which instructs the user how long to wait before asking for the next data. In most cases, this is around 5 seconds (to view the exact time, you can run the command with the -v flag. For example, running

    chat_downloader -v https://www.youtube.com/watch?v=5qap5aO4i9A

    will output something like:

    2021-07-04 21:26:40 | Nikitha_niki: @sanu nadu evda?
    2021-07-04 21:26:41 | Saitama: @ilmaa whats yout Insta ID? m kind of interested in you
    2021-07-04 21:26:41 | Sanu K: @nikitha fake i d ya ?
    [DEBUG] Total number of messages: 72
    [DEBUG] Sleeping for 5121ms.
    [DEBUG] Continuation: 0ofMyAOjARpYQ2lrcUp3b1lWVU5UU2pSbmExWkROazV5ZGtsSk9IVnRlblJtTUU5M0VnczFjV0Z3TldGUE5HazVRUm9UNnFqZHVRRU5DZ3MxY1dGd05XRlBOR2s1UVNBQiiV5NzYksrxAjAAQAFKFggBGAAgAFD-ouTZksrxAlgDeACiAQBQlOqF2ZLK8QJYwdz-yJ3A8QKCAQIIAYgBAKAB1I3m2ZLK8QI%3D
    [DEBUG] https://www.youtube.com:443 "POST /youtubei/v1/live_chat/get_live_chat?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8 HTTP/1.1" 200 None
    2021-07-04 21:26:42 | batman: @sweet fluff hoi fluffy.. i haven't seen ur last message.. so tell me how is life? is it fluffy?
    2021-07-04 21:26:46 | ᴅɪʟsᴏɴ: flash
    2021-07-04 21:26:46 | Meral: yo what’s up
    2021-07-04 21:26:46 | Elif Karakaş: msa okulda görüyoruz almanca tüm anadolu liselerinde var diye biliyorum
    2021-07-04 21:26:47 | Maruf Khan: @Leah hey thanks. I am not looking for gfs here or anything, just talking to people u know
    [DEBUG] Total number of messages: 77
    [DEBUG] Sleeping for 5057ms.
    [ERROR] Keyboard Interrupt
    [DEBUG] Session closed.

    As you can see, the program sleeps for ~5 seconds before making another request. This is exactly how your browser makes requests. So, the amount of requests would be exactly the same as if you were simply watching the stream.

  4. The sleep time depends on a few factors, but mostly on the 'load' that YouTube is experiencing. I have done a lot of testing, and for streams with ~1 million viewers, the timeout is set to around 25 seconds. However, to ensure no messages are missed, I set a maximum timeout of 8 seconds.

  5. Technically, you can reduce this time to an arbitrary amount (I have done testing with 1-2 seconds), however, I would recommend against doing this as YouTube might thing you are trying to DDoS them. 😜

TL;DR: between 5 and 8 seconds (but this can be set arbitrarily)

I'm wondering because I want to use the app for reading messages and creating a poll, but seeing as live_chat is in the robots.txt file how safe would this be for like a 12 hour stream?

I have tested the program for much longer than 12 hours at a time on various 24/7 livestreams without any issues, so I believe this would work well.


Feel free to ask any other questions 👍

BArdelean commented 3 years ago

Very happy about your answer, you covered every single one of my curiosities. I also took a look and answered a few of them, but time being so scarse, I was unable to look thoroughly. Thank you very much, I owe you one. Keep up the amazing work!

KR, Bodo