openalpr / train-ocr

Input files and scripts necessary to train the license plate OCR
GNU Affero General Public License v3.0
234 stars 109 forks source link

Train for two different kind of license plates #33

Open marcoschicote opened 7 years ago

marcoschicote commented 7 years ago

Hi Currently Argentina has two license plates models (6 chars vs 7 chars). We are transitioning from one model to another. Is it possible to train OpenALPR to recognize both models? Or I would have to try to recognize for one model and if no results are found, try with the other?

Thanks!

maechler commented 7 years ago

You can have multiple different fonts recognised and also match against several patterns (e.g. one for 6 chars and one for 7 chars), all that in one run. But if the plates have completely different sizes or one is one lined and the other is two lined, then you might have to let OpenALPR run twice.

What is the difference between the two plates models apart from the number of characters?

marcoschicote commented 7 years ago

Thanks for your answear @maechler! Here's an example: https://i0.wp.com/autoblog.com.ar/wp-content/uploads/2015/07/PATENTE.jpg The one above is the old model and the one below is the new model. Would this require two runs?

Thanks again!

maechler commented 7 years ago

Not necessarily, because the plates have quite similar width / height ratios, similar font size and OpenALPR can also auto detect inverted plates (white text on black background) or deal with different fonts. But there are a lot of other factors that come into play, especially the quality of the images is very important. You would definitely have to test many images to make sure it is accurate enough for your use case.

marcoschicote commented 7 years ago

Great, thanks @maechler! Fortunately there's a twitter account in argentina that collects a lot of plates, we can use that to train the model and its accuracy.

mahamoodalam commented 6 years ago

@marcoschicote @maechler were you successful in training the system for white numbers on black background ? I am training similar license plate but did not succeed yet. Can someone please help ?

Thanks.

maechler commented 6 years ago

@mahamoodalam I have only trained black numbers on white background. I am not entirely sure whether training white numbers on black background works the same.

I only know that openALPR is able to invert plates during the recognition process. E.g. converting white characters on black background to black characters on white background and thus enabling recognition with training data that used to be black characters on white background.

mahamoodalam commented 6 years ago

@maechler actually we have two types of plates in Qatar. One with white background and black characters and the other with black background and white characters. Plate dimensions and character size remain the same for these two license plates. I have trained the detector for license plates with white background and black characters. Now, if I want to use this trained system to detect the license plates with black background and white characters, its not detecting. In the config file I have

invert = auto

Attaching the images for your reference. 49-1 49

can you please help me in narrowing down on the issue ? Thanks.

maechler commented 6 years ago

@mahamoodalam That is exactly how I would have done it..

A few things I would check:

  1. Does the detection work fine for black characters on white background? Otherwise I would suggest to increase the amount or quality of the training data.
  2. What happens if you set invert=always? Like this it should give you about the same results as for the plates with black characters on white background.
  3. Set some of the debug settings to 1 to get more information about the recognition process:

openalpr.conf

debug_general         = 0
debug_timing          = 0
debug_detector        = 0
debug_prewarp         = 0
debug_state_id        = 0
debug_plate_lines     = 0
debug_plate_corners   = 0
debug_char_segment    = 0
debug_char_analysis   = 0
debug_color_filter    = 0
debug_ocr             = 0
debug_postprocess     = 0
debug_show_images     = 0
debug_pause_on_frame  = 0
mahamoodalam commented 6 years ago

@maechler 1) The detection works fine for black characters on white background.

2) It shows nothing when I set invert=always for the trained system. However when I use the command

alpr -c us webcam

it is recognizing the plate. Is it needed to also have the samples from black background and white characters plates while training the OCR or detector ?

3) what is the path to openalpr.conf ? Is it the one at /etc/openalpr/openalpr.conf

Thanks.

maechler commented 6 years ago

@mahamoodalam

Did you also train the detector or just OCR? I think if you use one of the existing detectors it should already be able to detect black background plates. That is something you could find out with setting debug_detector=1, because then you get to see what the detector recognised as plate regions. I do not think you have to train OCR for black background plates, that is what invert=auto is for, although I have to admit that I am not entirely sure.

I am talking about these files: https://github.com/openalpr/openalpr/tree/master/config Changes can be made in the file openalpr.conf.user.

mahamoodalam commented 6 years ago

@maechler I have changed the debug parameters at /usr/share/openalpr/config/openalpr.defaults.conf I am getting the below response for my trained system

test

for the below image test

thanks,

mahamoodalam commented 6 years ago

@maechler when I have disabled some of the debug options and executed the below command,

alpr -c us webcam

it has given me the below result. Note that Plate inverted: 1. 2018-05-03-124820_1680x1050_scrot

But when I executed the below command

alpr -c qa2 webcam

it has given me the below result. Note that Plate inverted: 0. Though I have set invert = auto 2018-05-03-125225_1680x1050_scrot

Thanks.

maechler commented 6 years ago

In your first example, the detector does not even find the plate. Have you trained your own detector? Otherwise you could use a pretrained detector, e.g. from the US:

qa2.conf

detector_file =us.xml

I am afraid, I do not know why your plate does not get inverted, maybe you could have a look at the source code that deals with inverting the image: https://github.com/openalpr/openalpr/

mahamoodalam commented 6 years ago

@maechler when I use us.xml it is inverting the image and detecting the plates. Is there anything wrong that I did while training the detector for qa2.xml ? I followed the steps mentioned at http://doc.openalpr.com/accuracy_improvements.html#training-the-detector I see that when I run the script prep.py for pos images I get the info that Invert: FALSE as shown in the below images test where can I configure Invert to be AUTO in the prep.py ?

Thank you.

maechler commented 6 years ago

@mahamoodalam I am afraid, I have no experience training the detector, I have only used pretrained detectors. But if it does work with the US detector, there really might be something wrong with qa2.xml. However if it works good enough with us.xml, you might not even need to train the detector.

alucard079 commented 5 years ago

@maechler I have read your comments above. You said that there is no problem to train many models, as long as the dimensions is not different with each other right?

but If I have different models with different dimensions, I have to run the openalpr many times with different settings?

Am I getting your point?

In our case.

I'm about to start the training but I'm having a problem with config file of adding a new country, because it seems like my plates in our country have different dimensions and fonts.

See the image below. Its from the old to the newest plate. Thank you sir for advance help. I very appreciated your help. I'm so struggling here so much.

1 12 selection_007 isapa

maechler commented 5 years ago

@alucard079 From your images I would guess that a predefined detector should work to recognise the different license plate sizes. It is possible to test for multiple fonts in one run (e.g. different EU fonts https://github.com/openalpr/train-ocr/tree/master/eu/input), thus it should be enough to let OpenALPR run once.

I would also guess that training your custom fonts could improve the results. It is however very difficult to give you recommendations from the outside. As training a custom font can be time consuming, you should start by testing the accuracy on real world examples. If the results are not satisfying you should start debugging with the settings I wrote above to see what exactly is failing and start working on the found issues.

If really OCR is your biggest issue then you should train your custom fonts. Getting train-ocr to work is probably easiest in a Linux environment (e.g. Ubuntu). I would also recommend to implement a small script with which you can test 100 - 200 different plate images in an automated way, to see if your accuracy is improving.

alucard079 commented 5 years ago

Yes, but how about the config file in train-ocr. I saw there that in adding that config you should define the dimensions of one plate. But what if I have 3 different dimensions of plate? What should I do? Based on the picture? These three are different from each other.

Also I see the eu/input I saw that that there kinds of box and tif files of every country. If it's compiled into traineddata All files that inside of eu/input would be compiled into that?

Does every 1 tif file, has a different kind of fonts inside of it? Or it's all the same.

alucard079 commented 5 years ago

As also said in the documentation. I have to train different fonts separately, because if I don't it would decrease the accuracy.

Also this leu.belgium.exp0.tif I saw that from 0-Z almost all the same but different sizes. I just want to clear it out does all of these is have a same font? For example one of '0' there is an arial and some of '0' there are times-new-roman? or all of these are arial?

image

maechler commented 5 years ago

@alucard079

As for the dimensions of the license plates, I do not exactly know how it is used, but in my experience it is not a problem to recognise plates that are of slightly different shape. EU plates and Swiss plates do not have the same dimensions either, but recognising both of them worked for us. Because we have mostly Swiss plates to recognise, we decided to go for the Swiss dimensions. But I think you have to test what works best for you, maybe it works better if you calculate an average of the dimensions.

The naming scheme for Tesseract font files looks like this: [lang].[fontname].exp[num].tif See https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.00%E2%80%933.02#make-box-files Usually you would put one font into one file, but you could also spread it across multiple files using exp0, exp1, exp2 I guess. All of the files in eu/input get compiled into one leu.traineddata.

alucard079 commented 5 years ago

Ah okay, it's not even a problem even though the size of plate is slight different from each other also it's character size? I'm about to get your point now. Meaning that [lang].[fontname].exp[num].tif and box is separately created specialized with it's font right?

Now if I have 3 different plate.

I will create it's tif and box file

So in total I have 3 tif files and 3 box files. Then it will be compiled all into traineddata?

Am I right? am I getting your point?

maechler commented 5 years ago

@alucard079 Yes, also for the character size it is not a problem if it is slightly different. Yes exactly, if you have 3 different fonts, then you get 3 tif and 3 box files that get compiled into on single traineddata.

alucard079 commented 5 years ago

@maechler Woah thank you, everything is clear for me now. uhmm. In using the train ocr repo do I need to compile the Openalpr harder way with its dependecies? or the Easy Way compilation will do?

Also in training every fonts? How many images do I need, 500 -1000 is enought? every font?

Plus very super little slightly different, like for example I don't get the exact measurement of a font/plate that I will declare in the config file? It will not affect the accuracy or the recognition itself?

maechler commented 5 years ago

@alucard079 I am afraid, I do not remember whether it worked the easy way or only the hard way for me. It has been a while since I used these tools, I just remember that I struggled to get it running and in the end it only worked on Ubuntu. Also I had to use PNG instead of TIF images in order to make it work (change .tif to .png in train.py).

The doc says you need around 200 clear images of your country’s license plates. You should have 200 images for each font you want to train. In the end you should end up with TIF or PNG files that have about as many characters as the ones that are already in this repository.

It should not be a problem if you do not know the exact measurement of a plate / character.

alucard079 commented 5 years ago

@maechler Can I have a conversation with you? In facebook or some social media?

Will this repo worked without compiling first the OpenALPR? Are this repo is independent?

I'm having a problem in installing the imageclipper (separate repo) Part of the instruction in this repo. Also It looks like this util isn't working it returning me an error in my CMD that the command is not existing. openalpr-utils-binarizefontsheet

Also can you give me some detailed explanation also instruction in doing this training? Especially the instructions in this repo, I'm having a hard time but some of the instruction I understand it correctly but some of them I'm having a hard time to really work out.

maechler commented 5 years ago

@alucard079 I have to admit I have not used openalpr and train-ocr in a while. In addition I never used the imageclipper application.

I remember also having a hard time getting this to work. I only managed getting train-ocr to run on Ubuntu and I needed to compile Tesseract myself (version 3.04.01, probably newer 3.* versions also work). The version from apt-get did not work for me. I had to convert the TIFF to PNG files and adjust train.py to load the PNG instead of the TIFF file. In addition I had to change the variable TESSERACT_DIR to the folder where I compiled Tesseract.

neumartin commented 5 years ago

@marcoschicote can you do it? I need Argentina alpr too! Thanks!

marcoschicote commented 5 years ago

@neumartin I did not go any further with this.

jlarghi commented 5 years ago

Hi @marcoschicote why do you discard this? We're thinking to use in our project? Thanks.