hiroi-sora / RapidOCR-json

OCR离线图片文字识别命令行windows程序,以JSON字符串形式输出结果,方便别的程序调用。基于 RapidOcrOnnx 。
MIT License
146 stars 25 forks source link

Not in cases where the image path contains Chinese characters #14

Open yida-lxw opened 5 months ago

yida-lxw commented 5 months ago
String imageFilePath = "C:/Users/Administrator/Desktop/单层/OCR测试.png";
imageFilePath = StringUtils.convertToUnicode(imageFilePath);
System.out.println("convert to ASCII:" + imageFilePath);
String command = "F:/RapidOCR-json_v0.2.0/RapidOCR-json.exe --models=F:/RapidOCR-json_v0.2.0/models --image_path=\"" + imageFilePath + "\"";

String text = CommandExecuteUtils.executeCommand(command);
System.out.println("text-->" + text);

The idea console print as below: convert to ASCII:C:/Users/Administrator/Desktop/\u5355\u5c42/OCR\u6d4b\u8bd5.png text-->RapidOCR-json v1.1.0 OCR init completed. {"code":200,"data":"Image path dose not exist. Path: \"C:/Users/Administrator/Desktop/\u5355\u5c42/OCR\u6d4b\u8bd5.png\""}

So, How to solve the situation where the image path contains Chinese characters? Any help will be appreciated.

hiroi-sora commented 5 months ago

The startup parameter --image_path does not support non-ascii characters. However, you can start the program first and then pass the path containing Chinese characters via json.

Example:

  1. Start the program without specifying --image_path.
  2. Wait for the output OCR init completed.
  3. Put the image path into a dictionary, for example: {"image_path":"D:/测试图片.png"}
  4. Encode the dictionary into a json string using ascii escape, for example: {"image_path":"D:/\u6d4b\u8bd5\u56fe\u7247.png"}
  5. Pass the command through the pipeline to obtain the returned result.

You can refer to PaddleOCR-json-java-api, which can be slightly modified for use with RapidOCR-json.