xenova / chat-downloader

A simple tool used to retrieve chat messages from livestreams, videos, clips and past broadcasts. No authentication needed!
https://chat-downloader.readthedocs.io/
MIT License
902 stars 127 forks source link

[BUG] `TypeError: expected string or bytes-like object, got 'NoneType'` when outputting as JSON #250

Open TheTechRobo opened 1 month ago

TheTechRobo commented 1 month ago

Basic information

Describe the bug

When running chat_downloader on https://www.twitch.tv/videos/117643919 and output as JSON, it crashes. It works fine if I get rid of -o hello.json.

Command/Code used

If running from the command line, provide the following:

  1. The command used (including the verbose tag, -v):
    chat_downloader -v -o hello.json https://www.twitch.tv/videos/117643919
  2. Output from the above command:
    [DEBUG] Python version: 3.11.2 (main, May  2 2024, 11:59:08) [GCC 12.2.0]
    [DEBUG] Program version: 0.2.8
    [DEBUG] Initialisation parameters: {'headers': None, 'cookies': None, 'proxy': None}
    [DEBUG] Created TwitchChatDownloader session.
    [INFO] Site: twitch.tv
    [DEBUG] Program parameters: {'url': 'https://www.twitch.tv/videos/117643919', 'start_time': None, 'end_time': None, 'max_attempts': 15, 'retry_timeout': None, 'interruptible_retry': True, 'timeout': None, 'inactivity_timeout': None, 'max_messages': None, 'message_groups': ['messages'], 'message_types': None, 'output': 'hello.json', 'overwrite': True, 'sort_keys': True, 'indent': 4, 'format': 'twitch', 'format_file': None, 'chat_type': 'live', 'ignore': None, 'message_receive_timeout': 0.1, 'buffer_size': 4096}
    [DEBUG] Starting new HTTPS connection (1): gql.twitch.tv:443
    [DEBUG] https://gql.twitch.tv:443 "POST /gql HTTP/1.1" 200 786
    [DEBUG] https://gql.twitch.tv:443 "POST /gql HTTP/1.1" 200 None
    [DEBUG] Match found: "<re.Match object; span=(0, 38), match='https://www.twitch.tv/videos/117643919'>". Running "_get_chat_by_vod_id" function in "TwitchChatDownloader".
    [DEBUG] Chat information: {'chat': <generator object TwitchChatDownloader._get_chat_messages_by_vod_id at 0x7fa4e34ab2e0>, 'title': None, 'duration': 5000, 'status': 'past', 'video_type': 'video', 'start_time': None, 'id': '117643919', '_output_writer': <chat_downloader.output.continuous_write.ContinuousWriter object at 0x7fa4e32e24d0>, '_output_callback': None, 'format': <function ChatDownloader.get_chat.<locals>.<lambda> at 0x7fa4e32e4680>, 'site': <chat_downloader.sites.twitch.TwitchChatDownloader object at 0x7fa4e45e7ed0>}
    [INFO] Retrieving chat for "None".
    [DEBUG] https://gql.twitch.tv:443 "POST /gql HTTP/1.1" 200 None
    [DEBUG] Session closed.
    Traceback (most recent call last):
    File "/home/thetechrobo/.local/bin/chat_downloader", line 8, in <module>
    sys.exit(main())
             ^^^^^^
    File "/home/thetechrobo/.local/lib/python3.11/site-packages/chat_downloader/cli.py", line 194, in main
    run(**args.__dict__)
    File "/home/thetechrobo/.local/lib/python3.11/site-packages/chat_downloader/chat_downloader.py", line 360, in run
    for message in chat:
    File "/home/thetechrobo/.local/lib/python3.11/site-packages/chat_downloader/sites/common.py", line 286, in __next__
    self._init_writer()
    File "/home/thetechrobo/.local/lib/python3.11/site-packages/chat_downloader/sites/common.py", line 257, in _init_writer
    title=safe_path(self.title),
          ^^^^^^^^^^^^^^^^^^^^^
    File "/home/thetechrobo/.local/lib/python3.11/site-packages/chat_downloader/utils/core.py", line 404, in safe_path
    return re.sub(r'[\/:*?"<>|]', replace_char, text)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/lib/python3.11/re/__init__.py", line 185, in sub
    return _compile(pattern, flags).sub(repl, string, count)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    TypeError: expected string or bytes-like object, got 'NoneType'

    If the output is too long, you can attach a text file or remove output which does not constitute to the problem.

Expected behavior

To get the chat. It works fine if I get rid of -o hello.json.

nevmerzhitsky commented 1 month ago

It's because the video has no title on the Twitch side. The library doesn't deal with this case. A patch is required somewhere in sites/common.py:_init_writer()