wie-bj commented 7 years ago

Design and implement OCR recognition services to extract air quality metrics from uploaded metering images.

SophiaNiu9 commented 7 years ago

https://ocr.space/ has a free API to get characters. It can only recognize the characters in right direction, upside down or upside right can not recognize. The recognize result of thumb_IMG_4336_1024.jpg is: ** Result for Image/Page 1 ** t 26t RH21 % a 63 a 0.040 4.05 UB/M3 MG/M3 UG/M3

wie-bj commented 7 years ago

@SophiaNiu9看起来效果不错呀

SophiaNiu9 commented 7 years ago

I implemented a service for OCR recognition for Hanvon Blue Sky detector. URL: http://9.186.90.72:5000/image, post method Input example: {"image":"thumb_IMG_4358_1024.jpg"} output example: { "fomol": "0.028", "humidity": "2", "image_text": "ﬁg\n\n‘1‘) Mm-.-an Maw\n\nt25°c RH2I]%\n\n3. 48\n\n \n\n& 0.028 MG/M3\n3.86 ms;", "pm10": "3.86", "pm2.5": "3. 48", "temperature": "25" }

SophiaNiu9 commented 7 years ago

pyocr的安装使用简要指南 http://www.linuxdiyf.com/linux/17484.html

$ sudo apt-get install python-imaging Install tessearact https://github.com/tesseract-ocr/tesseract/wiki/Compiling If they are not already installed, you need the following libraries (Ubuntu 16.04/14.04): //sudo apt-get install g++ # or clang++ (presumably) sudo apt-get install autoconf automake libtool sudo apt-get install autoconf-archive sudo apt-get install pkg-config sudo apt-get install libpng12-dev sudo apt-get install libjpeg8-dev sudo apt-get install libtiff5-dev //sudo apt-get install zlib1g-dev

if you plan to install the training tools, you also need the following libraries: sudo apt-get install libicu-dev sudo apt-get install libpango1.0-dev //sudo apt-get install libcairo2-dev

You also need to install Leptonica. One option is to install the distro's Leptonica package: sudo apt-get install libleptonica-dev

too low version: http://www.leptonica.org/download.html download latest package tar xzf leptonica-1.74.1.tar.gz ./configure make make install (need to change to root)

Tesseract uses a standard autotools based build system, so the compilation process should be familiar. git clone https://github.com/tesseract-ocr/tesseract.git cd tesseract ./autogen.sh ./configure LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make sudo make install sudo ldconfig

If you want the training tools (3.03+), you will also need to run the following commands: make training sudo make training-install 安装语言包，这个可以从网上直接安装，我这里只是安装了英文和中文的语言包： $ sudo apt-get install tesseract-ocr-eng tesseract-ocr-chi-sim Language Data (did not do, may need to do?) • Download the data file(s) for the language(s) you interest in. • Move it to the tessdata directory (e.g. 'mv tessdata $TESSDATA_PREFIX' if defined TESSDATA_PREFIX)

然后配置系统环境： export TESSDATA_PREFIX="tessdata所在的路径” export TESSDATA_PREFIX="/home/crluser/tesseract/tessdata" /etc/bash.bashrc

export TESSDATA_PREFIX="/usr/share/tesseract-ocr/tessdata" 这样就可以进行测试了： $ tesseract t2.png out -l chi_sim tesseract /home/crluser/smog/pictures/thumb_IMG_4409_1024.jpg out -l chi_sim 至此，环境已经处理完毕，就可以按照说明来安装pyocr了。下载pyocr源码包，解压执行命令去安装：安装setup tools apt-get install python-setuptools

git clone https://github.com/kexplo/pyocr.git $ sudo python ./setup.py install 如果没有出现意外的话就已经安装成功，可以尝试一个demo来验证我们的安装配置是否成功 vi test_pyocr.py

from PIL import Image import sys import pyocr import pyocr.builders

image_path = "/home/crluser/smog/pictures/thumb_IMG_4409_1024.jpg" tools = pyocr.get_available_tools() if len(tools) == 0: print("No OCR tool found") sys.exit(1)

tool = tools[0] print "Will use tool '%s'" % (tool.get_name())

Ex: Will use tool 'tesseract'

langs = tool.get_available_languages() print("Available languages: %s" % ", ".join(langs)) lang = langs[1] print("Will use lang '%s'" % (lang))

Ex: Will use lang 'fra'

txt = tool.image_to_string(Image.open(image_path), builder=pyocr.builders.TextBuilder()) print(txt)

python /home/crluser/smog/test_pyocr.py

wiebj / deep-breath

Design and implement OCR recognition service #7

Ex: Will use tool 'tesseract'

Ex: Will use lang 'fra'