xenova / chat-downloader

A simple tool used to retrieve chat messages from livestreams, videos, clips and past broadcasts. No authentication needed!
https://chat-downloader.readthedocs.io/
MIT License
948 stars 132 forks source link

[FEATURE] CLI should have option for explicit output format #78

Closed krichbanana closed 3 years ago

krichbanana commented 3 years ago

Is your feature request related to a problem? Please describe.

I tried out the program for the first time today, and noticed that the output data wasn't proper JSON. I realized that since I didn't specify an extension, the program chose to write Python-syntax data structures to the output file. The help information does not suggest a way to choose an output format, nor that the output format is decided by file extension.

Describe the solution you'd like

I wish for a command-line option to specify the output format (JSON/CSV/PLAIN), and I also wish for it to either default to something meaningful (like JSON) or refuse to output ambiguously. Python syntax shouldn't even be an option.

Describe alternatives you've considered

I could specify the json file extension, now knowing that the extension works hidden magic.

Additional context

Version 0.0.8, installed locally via pip3 on Arch Linux (rolling).

xenova commented 3 years ago

Hi there, specifying --output filename.json should write output in JSON format.

I am still updating the documentation (in the docs branch right now). So, I agree I should make this clearer. The idea is that the output format is decided on the extension provided. There are 3 allowed output formats: JSON, CSV and other. The first 2 should act as expected, while the last currently outputs as a python dictionary (but I agree should be changed; Perhaps to the formatted version, i.e. what is printed to standard output)

ghost commented 3 years ago

@xenova I think I should raise that when trying --output FIFO.json (where FIFO.json is a FIFO, of course), I get this error.

[INFO] Site: youtube.com
[DEBUG] Program parameters: {'url': 'https://www.youtube.com/watch?v=2kYdHk8bt-0', 'start_time': None, 'end_time': None, 'max_attempts': 15, 'retry_timeout': None, 'timeout': None, 'max_messages': None, 'logging': 'debug', 'pause_on_debug': False, 'exit_on_debug': False, 'testing': False, 'verbose': True, 'quiet': False, 'message_groups': ['messages'], 'message_types': None, 'output': 'test.json', 'overwrite': False, 'sort_keys': True, 'indent': 4, 'format_file': None, 'chat_type': 'live', 'ignore': None, 'message_receive_timeout': 0.1, 'buffer_size': 4096, 'format': 'youtube', 'inactivity_timeout': None}
[DEBUG] Starting new HTTPS connection (1): www.youtube.com:443
[DEBUG] https://www.youtube.com:443 "GET /watch?v=2kYdHk8bt-0 HTTP/1.1" 200 None
[DEBUG] Match found: "<re.Match object; span=(0, 43), match='https://www.youtube.com/watch?v=2kYdHk8bt-0'>". Running "_get_chat_by_video_id" function in "YouTubeChatDownloader".
[DEBUG] Session closed.
Traceback (most recent call last):
  File "/usr/local/bin/chat_downloader", line 11, in <module>
    load_entry_point('chat-downloader==0.0.7', 'console_scripts', 'chat_downloader')()
  File "/usr/local/lib/python3.8/dist-packages/chat_downloader-0.0.7-py3.8.egg/chat_downloader/cli.py", line 172, in main
    run(**args.__dict__)
  File "/usr/local/lib/python3.8/dist-packages/chat_downloader-0.0.7-py3.8.egg/chat_downloader/chat_downloader.py", line 378, in run
    chat = downloader.get_chat(**chat_params)
  File "/usr/local/lib/python3.8/dist-packages/chat_downloader-0.0.7-py3.8.egg/chat_downloader/chat_downloader.py", line 292, in get_chat
    output_file = ContinuousWriter(
  File "/usr/local/lib/python3.8/dist-packages/chat_downloader-0.0.7-py3.8.egg/chat_downloader/output/continuous_write.py", line 188, in __init__
    self.writer = writer_class(file_name, **new_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/chat_downloader-0.0.7-py3.8.egg/chat_downloader/output/continuous_write.py", line 51, in __init__
    self.file = open(self.file_name, 'rb+')
io.UnsupportedOperation: File or stream is not seekable.

Everything works fine when the FIFO doesn't have .json extension.

Now, I'm not knowledgeable enough to know if the problem comes from the FIFO having .json extension, or if it comes from the way chat_downloader formats data when outputting as proper .json somehow not working with FIFOs. (If it is the latter then this is unrelated to this issue, so a new ticket would probably be preferable.)

NOTE: My use case was needing to process more data from livestreams than the default output provided. I ended up not needing a FIFO as I figured out how to use a custom format file so as to output whatever I want from YouTube's chat jsons.

xenova commented 3 years ago

Oh that is quite strange. Yes, outputting as .json requires the file to be "seekable". This is because the writer continuously writes JSON items to the file. And because of the formatting, needs to seek to the last character and overwrite it when writing the next item.

I've never tested the tool on Arch Linux, so this might be an OS issue? I'll test it out.

As an alternative, you could use the python module to output to a JSON file. You can just add file-writing functionality to the demo program.

That being said, I am still working on improving the documentation.

krichbanana commented 3 years ago

(I don't think the above commenter is using Arch Linux, since Arch normally uses /usr/lib/python3.9/site-packages/ for system packages. Maybe another Linux, who knows.)

I want to add, if you add formatted output as one of the file formats, you should have the output file and stdout formatted separately, in case one wants to implement e.g. terminal escape codes to color superchats in the output.

xenova commented 3 years ago

As mentioned here, outputting to a text file no longer writes out python representations (which, I agree, was completely useless).

Let me know if you have any other questions.