xenova / chat-downloader

A simple tool used to retrieve chat messages from livestreams, videos, clips and past broadcasts. No authentication needed!
https://chat-downloader.readthedocs.io/
MIT License
906 stars 127 forks source link

Error scraping twitch data #161

Closed rll307 closed 2 years ago

rll307 commented 2 years ago

Hi,

Basic information

Describe the bug

I am trying to scrape some data from Twitch and some errors are poping up. As a result, not all the comments are downloaded .

Command/Code used

Running from the command line, provide the following:

  1. The command used (including the verbose tag, -v):
    chat_replay_downloader 'https://www.twitch.tv/videos/1520566958' -v --format_file csv -o Chat_1520566958.csv
  2. Output from the above command: Non relevant lines were deleted
    [DEBUG] Python version: 3.10.5 (main, Jun  6 2022, 18:49:26) [GCC 12.1.0]
    [DEBUG] Program version: 0.0.9
    [DEBUG] Starting new HTTPS connection (1): badges.twitch.tv:443
    [DEBUG] https://badges.twitch.tv:443 "GET /v1/badges/global/display HTTP/1.1" 200 96461
    [INFO] Site: twitch.tv
    [DEBUG] Parameters: {'url': 'https://www.twitch.tv/videos/1520566958', 'start_time': None, 'end_time': None, 'max_attempts': 15, 'retry_timeout': None, 'timeout': None, 'max_messages': None, 'logging': 'debug', 'pause_on_debug': False, 'inactivity_timeout': None, 'message_groups': ['messages'], 'message_types': None, 'format': 'twitch', 'format_file': 'csv', 'chat_type': 'live', 'message_receive_timeout': 0.1, 'buffer_size': 4096}
    [DEBUG] Starting new HTTPS connection (1): gql.twitch.tv:443
    [DEBUG] https://gql.twitch.tv:443 "POST /gql HTTP/1.1" 200 859
    [DEBUG] https://badges.twitch.tv:443 "GET /v1/badges/channels/526112731/display HTTP/1.1" 200 7731
    [DEBUG] Chat information: {'chat': <generator object TwitchChatDownloader._get_chat_messages_by_vod_id at 0x7f585a3544a0>, 'title': 'Jogos, conversas e cantoria! 🖤 !sorteio', 'duration': 36620, 'is_live': False, 'start_time': None, 'site': <chat_replay_downloader.sites.twitch.TwitchChatDownloader object at 0x7f585a2d0d30>, 'format': <function ChatDownloader.get_chat.<locals>.<lambda> at 0x7f585a2cec20>}
    [INFO] Retrieving chat for "Jogos, conversas e cantoria! 🖤 !sorteio".
    [DEBUG] Starting new HTTPS connection (1): api.twitch.tv:443
    [DEBUG] https://api.twitch.tv:443 "GET /v5/videos/1520566958/comments?client_id=kimne78kx3ncx6brgo4mv6wki5h1ko&cursor=&content_offset_seconds=0 HTTP/1.1" 200 None
    [DEBUG] Total number of messages: 58
    [DEBUG] https://api.twitch.tv:443 "GET /v5/videos/1520566958/comments?client_id=kimne78kx3ncx6brgo4mv6wki5h1ko&cursor=eyJpZCI6ImRmNWExNjRhLWIzMjUtNDdhZS1hM2JjLTgwZjg0YmI4ODg5MCIsImhrIjoiYnJvYWRjYXN0OjM5ODU3NzYyNzkzIiwic2siOiJBQUFBZExISjBjQVdfaGtrMU5KeXdBIn0f&content_offset_seconds=0 HTTP/1.1" 200 None
    [DEBUG] Total number of messages: 117
    [DEBUG] https://api.twitch.tv:443 "GET /v5/videos/1520566958/comments?client_id=kimne78kx3ncx6brgo4mv6wki5h1ko&cursor=eyJpZCI6IjE2YTNhYmY1LTEzYzEtNDMxZS04N2Y3LTA1ZmZjNTM2OTlhZSIsImhrIjoiYnJvYWRjYXN0OjM5ODU3NzYyNzkzIiwic2siOiJBQUFBM2ZYaC1rQVdfaG1PR09xYlFBIn0f&content_offset_seconds=0 HTTP/1.1" 200 None
    [DEBUG] Total number of messages: 176
    [DEBUG] https://api.twitch.tv:443 "GET /v5/videos/1520566958/comments?client_id=kimne78kx3ncx6brgo4mv6wki5h1ko&cursor=eyJpZCI6IjQwYjc3ZWUyLTRiMjktNGY5OC1hOTg4LTM4NjIwY2Y4OTVhMiIsImhrIjoiYnJvYWRjYXN0OjM5ODU3NzYyNzkzIiwic2siOiJBQUFCbGoyV2tZQVdfaHBHWUo4eWdBIn0f&content_offset_seconds=0 HTTP/1.1" 200 None
    [DEBUG] Total number of messages: 235
    [DEBUG] https://api.twitch.tv:443 "GET /v5/videos/1520566958/comments?client_id=kimne78kx3ncx6brgo4mv6wki5h1ko&cursor=eyJpZCI6IjY2YjYwZDE3LWQ5MzktNDk1Zi04MmM3LTQ2ODc5ZDUzMzlmOSIsImhrIjoiYnJvYWRjYXN0OjM5ODU3NzYyNzkzIiwic2siOiJBQUFCNUkxQXQ0QVdfaHFVc0VsWWdBIn0f&content_offset_seconds=0 HTTP/1.1" 200 None
    [DEBUG] Total number of messages: 294
    [DEBUG] https://api.twitch.tv:443 "GET /v5/videos/1520566958/comments?client_id=kimne78kx3ncx6brgo4mv6wki5h1ko&cursor=eyJpZCI6IjRjYTRlZmI4LTEyMWQtNDFiZC1hOTU0LTEzZmE1NWUwZTE4ZSIsImhrIjoiYnJvYWRjYXN0OjM5ODU3NzYyNzkzIiwic2siOiJBQUFDeWFQOXg4QVdfaHQ1eHdab3dBIn0f&content_offset_seconds=0 HTTP/1.1" 200 None
    [DEBUG] Total number of messages: 353
    [DEBUG] https://api.twitch.tv:443 "GET /v5/videos/1520566958/comments?client_id=kimne78kx3ncx6brgo4mv6wki5h1ko&cursor=eyJpZCI6ImQzODY3ZTMyLTI0NDEtNDQ3MS04OWViLWUyYzI3ZWU5ZDlhNiIsImhrIjoiYnJvYWRjYXN0OjM5ODU3NzYyNzkzIiwic2siOiJBQUFEc19pYjlrQVdfaHhrRzZTWFFBIn0f&content_offset_seconds=0 HTTP/1.1" 200 None
    [DEBUG] Session closed.
    Traceback (most recent call last):
    File "/home/rll307/.local/bin/chat_replay_downloader", line 8, in <module>
    sys.exit(main())
    File "/home/rll307/.local/lib/python3.10/site-packages/chat_replay_downloader/cli.py", line 229, in main
    for message in chat:
    File "/home/rll307/.local/lib/python3.10/site-packages/chat_replay_downloader/sites/common.py", line 87, in __iter__
    for item in self.chat:
    File "/home/rll307/.local/lib/python3.10/site-packages/chat_replay_downloader/sites/twitch.py", line 965, in _get_chat_messages_by_vod_id
    data = self._parse_item(comment, offset)
    File "/home/rll307/.local/lib/python3.10/site-packages/chat_replay_downloader/sites/twitch.py", line 756, in _parse_item
    TwitchChatDownloader._set_message_type(info, original_message_type)
    File "/home/rll307/.local/lib/python3.10/site-packages/chat_replay_downloader/sites/twitch.py", line 1197, in _set_message_type
    log(
    TypeError: log() takes from 2 to 3 positional arguments but 4 were given

    Expected behavior

    The command should download all the chats without the errors above.

Additional context/information

chat_downloader nstalled on Linux Manjaro using the command:

pip install chat-downloader --upgrade                                                                       
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: chat-downloader in /home/rll307/.local/lib/python3.10/site-packages (0.2.0)
Collecting chat-downloader
  Using cached chat_downloader-0.2.0-py2.py3-none-any.whl (80 kB)
  Downloading chat_downloader-0.1.10-py2.py3-none-any.whl (80 kB)
     |████████████████████████████████| 80 kB 1.1 MB/s             
Requirement already satisfied: requests in /usr/lib/python3.10/site-packages (from chat-downloader) (2.27.1)
Requirement already satisfied: isodate in /home/rll307/.local/lib/python3.10/site-packages (from chat-downloader) (0.6.1)
Collecting argparse
  Using cached argparse-1.4.0-py2.py3-none-any.whl (23 kB)
Requirement already satisfied: docstring-parser in /home/rll307/.local/lib/python3.10/site-packages (from chat-downloader) (0.14.1)
Requirement already satisfied: colorlog in /home/rll307/.local/lib/python3.10/site-packages (from chat-downloader) (6.6.0)
Requirement already satisfied: websocket-client in /home/rll307/.local/lib/python3.10/site-packages (from chat-downloader) (1.3.3)
Requirement already satisfied: six in /usr/lib/python3.10/site-packages (from isodate->chat-downloader) (1.16.0)
Requirement already satisfied: chardet>=3.0.2 in /usr/lib/python3.10/site-packages (from requests->chat-downloader) (4.0.0)
Requirement already satisfied: idna>=2.5 in /usr/lib/python3.10/site-packages (from requests->chat-downloader) (3.3)
Requirement already satisfied: urllib3>=1.21.1 in /usr/lib/python3.10/site-packages (from requests->chat-downloader) (1.26.9)
Installing collected packages: argparse
Successfully installed argparse-1.4.0

Thank you very much for your help.

xenova commented 2 years ago

Please note you are running chat_replay_downloader and not chat_downloader.

Run chat_downloader 'https://www.twitch.tv/videos/1520566958' -v --format_file csv -o Chat_1520566958.csv instead.