Google Drive OCR error - Githubissues

xulihang / ImageTrans-docs

Documentation of ImageTrans, a computer-aided image translation tool. ImageTrans的文档项目。ImageTrans是一款计算机辅助图片/漫画翻译软件。

https://imagetrans.readthedocs.io/

87 stars 9 forks source link

Google Drive OCR error #746

Closed CapMan47 closed 2 months ago

CapMan47 commented 2 months ago

Greetings, I would be glad to have a little help. I recently ran out of the 90 day free trial for google ocr cloud vision and decided to switch to google drive ocr, since this guide says it's free - https://www.basiccat.org/how-to-use-google-cloud-in-imagetrans/. I configured everything according to it, but I get this error instead of ocr result: java.net.SocketException: A program on your host computer has broken the established connection Or these: Server Error I'm doing this from Russia, so could this be the problem? I would be grateful for help.

xulihang commented 2 months ago

Have you done the following?

https://www.basiccat.org/how-to-use-google-cloud-in-imagetrans/#install-the-plugin-for-imagetrans

CapMan47 commented 2 months ago

Yes. Just in case, just repeated all the steps and it didn't work.

CapMan47 commented 2 months ago

I remembered that I tried Google Drive OCR using this guide before and everything worked. But then I switched to Google Cloud and now three months are over and I need Drive again.

xulihang commented 2 months ago

Well, it works on my side. Maybe you can try to use a VPN. By the way, I've refreshed the default google drive OCR's token for another 7 days.

CapMan47 commented 2 months ago

Well you must have done something, because it started working for me. Apparently updating the token solved the problem. But is there any way I can stop being dependent on a token update on your part?

xulihang commented 2 months ago

This one: https://www.basiccat.org/how-to-use-google-cloud-in-imagetrans/#install-the-plugin-for-imagetrans

You are still using the packed plugin. You need to replace the plugin files and restart ImageTrans.

CapMan47 commented 2 months ago

After restarting and starting OCR I was redirected to the page with access confirmation. This has never happened before. After confirmation everything worked. Apparently during the tests, I never restarted the application, my fault. Thanks for your help and quick response. I'm always amazed at how quickly you solve user issues :)

xulihang commented 2 months ago

Well, I didn't write to restart ImageTrans in the guide. I will add this.

CapMan47 commented 2 months ago

Clarification on Google Drive OCR For some reason it recognises text with a space before it, regardless of the complexity of the text. That is, in source text there is always a space before the text. Other OCRs do not have this problem. Have you encountered this and how can this problem be solved? Maybe it's a problem that can't be solved at the OCR level, but is it possible to remove these spaces when exporting text to .docx or .txt? In fact, I've long wanted to know about text editing when exporting, is there any way to customise it? So that Source text can be changed at the export stage.

xulihang commented 2 months ago

You can use find and replace to remove the spaces.

You can enable regular expression to remove the spaces at the head: ^.

CapMan47 commented 2 months ago

For some reason the space that appears after OCR Google Drive is not read in find and replase. When I put the space myself, it is visible and can be replaced, as in your example. But the space that appears as a result of OCR is not found in find and replace. In the first screenshot, the space that OCR set - it is not shown On the second screenshot, the space that I typed myself - it can be seen using regular expression. Снимок экрана 2024-08-28 200626 Снимок экрана 2024-08-28 200555

xulihang commented 2 months ago

This is strange.

You can uncheck auto remove line breaks and then replace "\n" instead.

CapMan47 commented 2 months ago

At first I thought this helped and solved the problem, but after working for a bit I noticed that in some bubbles the text is glued together where it is moved to a new line. And this happens often enough to break the work flow. The first screenshot shows an example of the problem. The second one is an example of how the text is read initially. Before the operation with /n

Is there any other way? Sorry if I'm already inundated with questions, but I'd appreciate the help).

CapMan47 commented 2 months ago

Apparently google drive ocr doesn't add a space at the beginning, but something else. Word reads this character not as a space. I haven't found any information on what this character is. Do you have any ideas?

xulihang commented 2 months ago

This is BOM (https://www.b4x.com/android/forum/threads/b4x-reading-a-utf-8-file-that-might-have-bom.90943/#content).

I've updated the files to fix this. Please redo this step: https://www.basiccat.org/how-to-use-google-cloud-in-imagetrans/#install-the-plugin-for-imagetrans

CapMan47 commented 2 months ago

Yeah, everything's working as it should. Thanks a lot for your help and patience)