machinewrapped / gpt-subtrans

Open Source project using LLMs to translate SRT subtitles
Other
311 stars 36 forks source link

Errors when trying to translate to Simplified Chinese #121

Closed yuxi-liu-wired closed 4 months ago

yuxi-liu-wired commented 4 months ago

First attempt:

Error: Can't write project file, no scenes
Traceback (most recent call last):
  File "C:\Users\DeadScholar\Desktop\gpt-subtrans\gpt-subtrans.py", line 75, in <module>
    project.TranslateSubtitles()
  File "C:\Users\DeadScholar\Desktop\gpt-subtrans\PySubtitle\SubtitleProject.py", line 107, in TranslateSubtitles
    self.WriteProjectFile()
  File "C:\Users\DeadScholar\Desktop\gpt-subtrans\PySubtitle\SubtitleProject.py", line 215, in WriteProjectFile
    raise Exception("Can't write project file, no scenes")
Exception: Can't write project file, no scenes

Second attempt:

INFO: HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO: Scene 1 batch 33: 0 lines and 0 untranslated.
INFO: Summary: 这一批字幕讨论了GPT(生成预训练变换器)的概念,以及它如何被用于翻译语言。同时,也提到了所有这些工具使用的训练数据都是被抓取的。
--- Logging error ---
Traceback (most recent call last):
  File "C:\Users\DeadScholar\Miniconda3\lib\logging\__init__.py", line 1103, in emit
    stream.write(msg + self.terminator)
  File "C:\Users\DeadScholar\Miniconda3\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 15-22: character maps to <undefined>
Call stack:
  File "C:\Users\DeadScholar\Desktop\gpt-subtrans\GUI\Command.py", line 57, in run
    success = self.execute()
  File "C:\Users\DeadScholar\Desktop\gpt-subtrans\GUI\ProjectCommands.py", line 293, in execute
    scene = project.TranslateScene(self.scene_number, batch_numbers=self.batch_numbers, line_numbers=self.line_numbers, translator = self.translator)
  File "C:\Users\DeadScholar\Desktop\gpt-subtrans\PySubtitle\SubtitleProject.py", line 158, in TranslateScene
    translator.TranslateScene(scene, batch_numbers=batch_numbers, line_numbers=line_numbers)
  File "C:\Users\DeadScholar\Desktop\gpt-subtrans\PySubtitle\SubtitleTranslator.py", line 139, in TranslateScene
    self.TranslateBatches(self.client, batches, line_numbers, context, remaining_lines)
  File "C:\Users\DeadScholar\Desktop\gpt-subtrans\PySubtitle\SubtitleTranslator.py", line 239, in TranslateBatches
    self.ProcessTranslation(batch, line_numbers, context, client)
  File "C:\Users\DeadScholar\Desktop\gpt-subtrans\PySubtitle\SubtitleTranslator.py", line 349, in ProcessTranslation
    logging.info(f"Summary: {batch.summary}")
Message: 'Summary: 这一批字幕讨论了GPT(生成预训练变换器)的概念,以及它如何被用于翻译语言。同时,也提到了所有这些工具使用的训练数据都是被抓取的。'
Arguments: ()

Third attempt:

INFO: Reading project data from test.subtrans
ERROR: Unable to load test.srt ('NoneType' object cannot be interpreted as an integer)

Loading test.srt directly gives the same error. Deleting the test.subtrans would allow loading test.srt again, but starting anew.

yuxi-liu-wired commented 4 months ago

I found the solution, I think. I just need to add the encoding='utf-8', option every time I import logging.

logging.basicConfig(
    format='%(levelname)s: %(message)s',
    encoding='utf-8', 
    level=logging_level
    )
machinewrapped commented 4 months ago

Thanks for the report, I'll take a look. Adding encoding to the logger config makes sense, but it should only be necessary to do it in the main scripts (gpt-subtrans.py, gui-subtrans.py) as they configure a global logger instance.

machinewrapped commented 4 months ago

Actually now that I think about it I have vague memories of Windows causing problems when I tried to specify the encoding for the console logger, so I ended up leaving it with default encoding... it might be necessary to add a command line argument to set the encoding, or to write a test message in a try-catch block and fall back to default encoding if it fails.

machinewrapped commented 4 months ago

Can you try this? It attempts to log as utf-8 and falls back to default encoding if it fails: https://github.com/machinewrapped/gpt-subtrans/commit/c60d8fb5ebdfae8251df00d7adf52f12f318b163

I haven't seen it fail yet so the fallback is untested, I'll see if I can remember under what conditions utf-8 wasn't supported.

yuxi-liu-wired commented 4 months ago

I have tried this and it works as expected. There are no error messages in the command line or the GUI, and the translation works, and the savefile works.