xenova / chat-downloader

A simple tool used to retrieve chat messages from livestreams, videos, clips and past broadcasts. No authentication needed!
https://chat-downloader.readthedocs.io/
MIT License
948 stars 132 forks source link

[FEATURE] time_text and time_in_seconds for livestreams #142

Closed izzastor closed 2 years ago

izzastor commented 2 years ago

Since this feature is only on VODs I think have another idea to implement this for livestreams

There could be an argument for elapsed time where from the time the command has been run to scrape the live chat, the time_text and time_in_seconds will record how much time it took for that message to be sent and the time will be added on to every chat message that can be sent

Alternatively (specifically Twitch), you can check the time elapsed on the stream and use that as the time_text and time_in_seconds

xenova commented 2 years ago

I suppose that is a valid request - in fact, I've tried it already, but unfortunately there were a lot of inconsistencies when it comes to "guessing" the start time. For example, on YouTube, using the start time of the stream as an offset gave inaccuracies of around 4-10 seconds. This means that running the downloader twice (one during stream and one after) would give different results, making it inconsistent.

So, if you want to include this functionality, I would recommend implementing it yourself using the python module. I could assist you if you needed some help, but unless there is a way to do it consistently, I'm not sure if adding this feature would be the best idea.

izzastor commented 2 years ago

Ok I did more research and I basically narrowed down what I want, I don't have much experience with Python but I think you could probably help me with this. time_text can be removed because I don't need it, however time_in_seconds is all I need.

Is there a way that I can run how much time has elapsed from the command being put it terminal and message being recorded in the JSON with the time_in_seconds equaling to the elapsed seconds at that moment the message is recorded.

This is assuming the messages are not recorded in batch and are recorded one at a time otherwise this might cause issues

image

Alternatively the seconds being shown here on the chat_downloader terminal output could also be recorded assuming the 30 seconds is 0 in the time_in_seconds

As for the YouTube stream start time that doesn't matter since I have this downloader running the moment someone starts streaming using scripts on Linux.

xenova commented 2 years ago

Is there a way that I can run how much time has elapsed from the command being put it terminal and message being recorded in the JSON with the time_in_seconds equaling to the elapsed seconds at that moment the message is recorded.

There sure is 👍 You can use the timestamp value to get the time the message was sent and subtract the time the script was run to determine the "offset" of the message in seconds.

import time
from chat_downloader import ChatDownloader

url = 'https://www.youtube.com/watch?v=5qap5aO4i9A'
chat = ChatDownloader().get_chat(url)                   # create a generator

start_time = time.time()                                # get current time
for message in chat:                                    # iterate over messages
    offset = message['timestamp'] / 1e6 - start_time    # calculate offset in seconds
    print(f'{offset:.5f}', '|', message.get('message')) # print offset and message

^ Does this do what you need?

Edit: Note that you might get some negative times initially since those were sent before you ran the script. Of course, you can ignore them by checking that offset > 0.

Running the above code outputs the following:

-17.94132 | in my opinion Lean > BTS > the cookie run fandom
-15.51420 | hola
-14.86723 | @CharS Alv Ah ok eres 3 años menor que yo lol
-12.86076 | :loudly_crying_face::loudly_crying_face::loudly_crying_face::loudly_crying_face::loudly_crying_face:مره حلوه
-12.45566 | bts=:cockroach:
-12.28736 | artık lofigirl friendslerim var
-11.20761 | @mahaمالك حزينه؟:pleading_face::broken_heart:
-9.71141 | thanks preethika.. I'll watch
-6.01750 | Victoria Okay My son:smirking_face:
-5.68070 | march me itni garmi hai to may me kya hoga
-3.89551 | ( ◜‿◝ )♡
-3.99381 | @Zayn mtlb kuch decide nhi ha lgta ha
-3.43058 | Hola
-0.66994 | يي شيماء انتي....
-0.50844 | ما مشي حالوو الكود
1.71061 | @Ryuk thank you mate
3.13929 | any kpop fan know nct??
8.34659 | :upside_down_face:
9.57274 | que onda la banda amigueraa ??
17.37625 | there are currently 2 seasons in demon slayer
18.66299 | hey guys
19.18072 | печально что не с кем пообщаться
19.36836 | @Bruno's favorite child boyband to some, sources of hope to others
20.02375 | مها وdark. صوركم:grinning_face_with_sweat::grinning_face_with_sweat:
20.09055 | Vitoria*
21.45397 | cks-bopk-sgj جرب هبدا
21.91760 | Olivia Rodrigo so much better than BTS
24.92366 | NCT problematic lately
26.69785 | والله منيحة زي ماانا
28.24907 | امجد cks-bopk-sgj
33.67794 | امجد cks-bopk-sgj
33.65063 | who wants t be my discord kitten
34.50256 | من كثر الكياته نفست
36.38116 | امجد cks-bopk-sgj
37.65990 | @PREETHIKA E hiii im an NCTzen I love themmm
39.02592 | يلا
39.17949 | امجد cks-bopk-sgj
39.49367 | what did NCT do??
39.55707 | "may" toh pighal jaunga bhai :sun_with_face:
40.49116 | Hello:)
41.31644 | =..=???
43.73970 | lambi hai ghum ki sham.... pr shaam hi to hai
44.59763 | @Jean Víquez jaja exacto.. en enero los cumplí jaja..
izzastor commented 2 years ago

This is exactly what I need. All I need is to know how to add those seconds in the JSON file as time_in_seconds by default for ongoing live streams in the program. I have chat_downloader currently installed as a pip package and if I can add that then I am basically done.

xenova commented 2 years ago

Okay I see 👍 . This should do the trick:

import time
from chat_downloader import ChatDownloader
from chat_downloader.output.continuous_write import ContinuousWriter

output_file = 'file.json'
url = 'https://www.youtube.com/watch?v=5qap5aO4i9A'

chat = ChatDownloader().get_chat(url)                   # create a generator
writer = ContinuousWriter(output_file,
                          # indent=4                    # uncomment this if you want it indented
                          )

start_time = time.time()                                # get current time
for message in chat:                                    # iterate over messages

    # Overwrite default time_in_seconds
    message['time_in_seconds'] = message['timestamp'] / \
        1e6 - start_time  # calculate offset in seconds

    writer.write(message, flush=True)  # Write item to file
    chat.print_formatted(message)
xenova commented 2 years ago

I had to use my "ContinuousWriter" class - since you intend to use it on live streams, it will continuously output to your file.

izzastor commented 2 years ago

This is honestly perfect, the JSON has formatting issues but I can fix that no problem. You're the best. Do you have a donation link? I would like to fund any other helpful tools you are working on or will make in the future,

xenova commented 2 years ago

This is honestly perfect, the JSON has formatting issues but I can fix that no problem.

Perfect! :) What formatting do you need? If you want to "pretty-print" it, you can set indent=4 (uncomment that line by removing the #). The number you choose specifies how many spaces to put for each indentation level.

Do you have a donation link? I would like to fund any other helpful tools you are working on or will make in the future,

That's so kind! I have 2 links specified under the "Sponsor" button on the home page at the top: https://ko-fi.com/xenova or https://www.buymeacoffee.com/xenova , but it really isn't necessary :)

izzastor commented 2 years ago

Oh alright, uncommenting that fixed it, thanks for all the help.

xenova commented 2 years ago

No worries 👍 Feel free to open another issue if you have any further questions. I'll close this one for now.