mathewthe2 / Game2Text

Complete toolbox for gamifying language learning
https://www.Game2Text.com
Apache License 2.0
186 stars 24 forks source link

Multi-line OCR result crashes logs window #14

Closed stevejackson closed 3 years ago

stevejackson commented 3 years ago

Platform/version

main branch, Apr 23 2021. MacOS 11.1, Chrome.

Using Space OCR EU.

Bug reproduction

OCR'd this:

Screen Shot 2021-04-23 at 4 18 39 AM

It results in OCR with multiple lines of text (it has line breaks):

Screen Shot 2021-04-23 at 4 19 28 AM

Looks like this in the logs/text file:

20210423-041552, 「城下町はあの門を越えた先た。
ジャマな門番には金を握らせるな・・
だガ旅立ら前に計な出費は避けをいな。

It causes this error when logs panel is opened up:

Exception in thread Thread-15:
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/steven/dev/Game2Text/logger.py", line 65, in log_video_image
    insert_newest_log_with_image(base64_image, os.path.splitext(image_path)[1])
  File "/Users/steven/dev/Game2Text/logger.py", line 143, in insert_newest_log_with_image
    saved_logs = get_logs()
  File "/Users/steven/dev/Game2Text/logger.py", line 125, in get_logs
    date = parse_time_string(log_id)
  File "/Users/steven/dev/Game2Text/logger.py", line 24, in parse_time_string
    return datetime.strptime(time_string, '%Y%m%d-%H%M%S')
  File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data 'ジャマな門番には金き握らせるな' does not match format '%Y%m%d-%H%M%S'

Likely cause

I think it's expecting all log entries to be a single line. We could fix this to be on a single line, or else change the logger format.

mathewthe2 commented 3 years ago

This seems like a problem unique to the return result from OCR space. We could parse the results in ocr_space.py so they could be on one line.

Edit: removed new lines in ocr_space result with commit https://github.com/mathewthe2/Game2Text/commit/50e075d3f192121d9e375a744ce650f73e222599 Closing now.