xenova / chat-downloader

A simple tool used to retrieve chat messages from livestreams, videos, clips and past broadcasts. No authentication needed!
https://chat-downloader.readthedocs.io/
MIT License
936 stars 131 forks source link

[BUG] v0.2.5 imcomplete chat, again #208

Closed hellishvictor closed 1 year ago

hellishvictor commented 1 year ago

Basic information

Describe the bug

Trying to download a twitch chat, but only output the first lines: https://www.twitch.tv/videos/1821347883

0:25 | (Subscriber) EkaitzIS200: uy el micro se corta
0:29 | (Subscriber, Prime Gaming) jjg09: audio f
0:30 | (Prime Gaming) wskk: Hola wapa
0:31 | (Subscriber, Gifter Leader 2) kithommer: va mal el mic
0:35 | (Subscriber) Tsalico: se corta el audio
0:36 | (Subscriber, Gifter Leader 2) kithommer: se corta
0:37 | (Subscriber) ImChrispy11: te calmas lluna el audio se corta

Command/Code used

If running from the command line, provide the following:

  1. The command used (including the verbose tag, -v):
    
    chat_downloader https://www.twitch.tv/videos/1821347883 -v > chat.txt
2. Output from the above command:

[DEBUG] Python version: 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 23:03:10) [MSC v.1916 64 bit (AMD64 )] [DEBUG] Program version: 0.2.5 [DEBUG] Initialisation parameters: {'headers': None, 'cookies': None, 'proxy': None} [DEBUG] Created TwitchChatDownloader session. [DEBUG] Starting new HTTPS connection (1): badges.twitch.tv:443 [DEBUG] https://badges.twitch.tv:443 "GET /v1/badges/global/display HTTP/1.1" 200 100493 [INFO] Site: twitch.tv [DEBUG] Program parameters: {'url': 'https://www.twitch.tv/videos/1821347883', 'start_time': None, ' end_time': None, 'max_attempts': 15, 'retry_timeout': None, 'interruptible_retry': True, 'timeout': None, 'inactivity_timeout': None, 'max_messages': None, 'message_groups': ['messages'], 'message_typ es': None, 'output': None, 'overwrite': True, 'sort_keys': True, 'indent': 4, 'format': 'twitch', 'f ormat_file': None, 'chat_type': 'live', 'ignore': None, 'message_receive_timeout': 0.1, 'buffer_size ': 4096} [DEBUG] Starting new HTTPS connection (1): gql.twitch.tv:443 [DEBUG] https://gql.twitch.tv:443 "POST /gql HTTP/1.1" 200 845 [DEBUG] https://badges.twitch.tv:443 "GET /v1/badges/channels/175017835/display HTTP/1.1" 200 1286 [DEBUG] Match found: "<re.Match object; span=(0, 39), match='https://www.twitch.tv/videos/1821347883 '>". Running "_get_chat_by_vod_id" function in "TwitchChatDownloader". [DEBUG] Chat information: {'chat': <generator object TwitchChatDownloader._get_chat_messages_byvod id at 0x0000000003677C80>, 'title': '💦⛱️DÍA DE PLAYA, LIMPIAMOS LA ORILLA !SORTEO', 'duration': 207 90, 'status': 'past', 'video_type': 'video', 'start_time': None, 'id': '1821347883', '_output_writer ': None, '_output_callback': None, 'format': <function ChatDownloader.get_chat.. at 0x0000000003657790>, 'site': <chat_downloader.sites.twitch.TwitchChatDownloader object at 0x00000000 0362BE80>} [INFO] Retrieving chat for "💦⛱️DÍA DE PLAYA, LIMPIAMOS LA ORILLA !SORTEO". [DEBUG] https://gql.twitch.tv:443 "POST /gql HTTP/1.1" 200 None [DEBUG] Session closed. Traceback (most recent call last): File "C:\Program Files (x64)\python\lib\runpy.py", line 193, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files (x64)\python\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Program Files (x64)\Python\Scripts\chat_downloader.exe__main__.py", line 7, in File "C:\Program Files (x64)\python\lib\site-packages\chat_downloader\cli.py", line 194, in main run(**args.dict) File "C:\Program Files (x64)\python\lib\site-packages\chat_downloader\chat_downloader.py", line 361 , in run callback(message) File "C:\Program Files (x64)\python\lib\site-packages\chat_downloader\chat_downloader.py", line 358 , in callback chat.print_formatted(item) File "C:\Program Files (x64)\python\lib\site-packages\chat_downloader\sites\common.py", line 304, i n print_formatted safe_print(self.format(item), flush=flush) File "C:\Program Files (x64)\python\lib\site-packages\chat_downloader\chat_downloader.py", line 260 , in chat.format = lambda x: formatter.format( File "C:\Program Files (x64)\python\lib\site-packages\chat_downloader\formatting\format.py", line 1 66, in format substitution = re.sub(self._INDEX_REGEX, lambda match: self._replace( File "C:\Program Files (x64)\python\lib\re.py", line 208, in sub return _compile(pattern, flags).sub(repl, string, count) File "C:\Program Files (x64)\python\lib\site-packages\chat_downloader\formatting\format.py", line 1 66, in substitution = re.sub(self._INDEX_REGEX, lambda match: self._replace( File "C:\Program Files (x64)\python\lib\site-packages\chat_downloader\formatting\format.py", line 9 6, in _replace value = separator.join( TypeError: sequence item 1: expected str instance, NoneType found

If the output is too long, you can attach a text file or remove output which does not constitute to the problem.

### Otherwise, if using the python module, provide the following:

1. A [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example):
```python
# python code
  1. Output, traceback or other information relating to the bug:

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context/information

Add any other context or information about the problem here.

TheTechRobo commented 1 year ago

This could be related, though I havent looked closely at the codebase:

https://github.com/yt-dlp/yt-dlp/issues/7058#issuecomment-1552272940

Looks like a lot of GQL is behind 'integrity checks' that are generated by JavaScript. :/

xenova commented 1 year ago

Related to https://github.com/xenova/chat-downloader/issues/209. Will fix.

Looks like a bug from Twitch actually.

xenova commented 1 year ago

Same issue as this: https://github.com/xenova/chat-downloader/issues/209#issuecomment-1554397820

Turns out you found a bug with twitch:

"userBadges":[
   {
      "id":"Ozs=",
      "setID":"",
      "version":"",
      "__typename":"Badge"
   },
   {
      "id":"Yml0czsxMDAwOw==",
      "setID":"bits",
      "version":"1000",
      "__typename":"Badge"
   }
],

is what is returned by their API. As you can see, the first badge is missing a version and an ID.


Anyway, fix is live in version 0.2.6 :)

pip install --upgrade chat-downloader