Check out my 2 YOUTUBE channels for more:
For Vietnamese student, you can read the LVTN_Mai Chí Bảo_1710586.pdf
and LVTN_Mai Chí Bảo_1710586.pptx
which is written in Vietnamese. For foreign reader, I'll update the English version ASAP
This project is a part of my thesis. In short, a secondary school teacher spend too much time on updating score manually (around 4000 test papers/year according to this news )
The goal of the thesis is to assist teachers in automatically updating results in Excel after marking their students' tests. However, in this project, we just collect data such as names and student IDs and recognize them in order to prepare for the next stage.
Figure 1. Test paper of Ho Chi Minh University of Technology
As you can see, I use my university's test paper. My name is Mai Chi Bao and my student ID (MSSV) is 1710586. Those are handwritten information that I'd want to cut out. Of course, there's the score. But we'll talk about it at another repository later.
The workflow of this system is like this
data/Class_list.xlsx
The GUI:
Figure 2. GUI
data/inkml_2_img.py
to covert ikml file into imagesThose are raw data, and without Data Augmentation, they won't assist at all.
I applied them all in source/prepare_MSSV_dataset.py
and source/imgtocsv.py
for both name and student ID training. I found that those methods are not enough, so the solution is to collect more real data. I added about 220 photos for each, and with data augmentation, I was able to increase it to 20000 images, with good results.
You can find code in source/Preprocessing.py
The flow of this stage is:
We take the background information with the input image, and the picture is not in the proper direction, making it difficult to extract and recognize. The process is significantly easier with the aid of Image Alignment.
Figure 3. Image Alignment
Reference: https://www.pyimagesearch.com/2020/08/31/image-alignment-and-registration-with-opencv/
Then I crop images I need with fixed pixels at all times
Figure 4. MSSV_crop.jpg
I used contrast maximization with Top hat and Black hat method. I found this can hold back lots of necessary information after Otsu Threshold, especially with blur images.
I did compare between Adaptive Threshold and Otsu Theshold. Adaptive Threshold which we know that works really well with variations in lighting conditions, shadowing... You can visit this site to know more. However, noise is retained. Even with the Gaussian Blur step. It's like a lot of noise which is hard to apply remove line and recognize step successfully. Otsu turns out performing so well, I guess that because the small size of image after cropping reduces the effect of light variance.
Figure 5. Image after removing line
I have compared between EAST and Scale Space techniques. You can see the result of EAST
Figure 6. EAST result
Pretty good hah! But these drawbacks made me decide to choose Scale Space technique:
Scale Space technique:
Figure 7. Word segmentation
My model here is CRNN + Attention + CTC Loss
Figure 8. Model Structure
I will briefly describe the model. You can easily find papers about this model cause it's too famous
I also used Batch Normalization, Drop out,... The model structure is in source/word_model.py
. Both name and studen ID Recognition share the same model
Reference:
I have created 2 Kaggle Notebooks for training name and student ID (MSSV). I had carefully explained stages there so I won't try writing all of them again here.
I also used Early Stopping, Learning rate changing to increase the performance
You can find the .h5
model in my google drive because GitHub don't allow me to upload file bigger than 25Mb
Because of the variations in the word and number datasets, I had to change the way I trained the model, set up parameters, and assessed it. For name recognition, I'll focus on the strategies I employed throughout the training phase. Due to the lack of real data, I'd evaluate number recognition based on how I built the dataset.
For evaluating the impact of changes on Image processing and model structure
The first two tables show the results of 122 test papers with only my name and MSSV on them (with a wide range of light, camera angle and distance, picture resolution...) And utilize them to identify my index in my 245-student class list.
Name Recognition | CRNN + CTC | + Data Augmentation | + Learning Rate changing | + Attention | + Lexicon search | |
---|---|---|---|---|---|---|
CER | 35.25% | 23.40% | 16.77% | 16.24% | 0.45% | |
WER | 74.59% | 69.40% | 45.63% | 47.27% | 0.55% |
Table 1. Result of Name recognition on 122 images
MSSV Recognition | 7-digit number and blob noise | + Combination of 1,3,4,5,7-digit number | + Rotate, scale, Random Cutout, Line Noise on digit images | + Changing the distance between digits. Scale, Rotate on multi-digit images | + Adding real data | + Lexicon search | |
---|---|---|---|---|---|---|---|
CER | 63.82% | 48.24% | 45.55% | 13.58% | 3.63% | 2.58% | |
WER | 100.00% | 100.00% | 100.00% | 63.11% | 22.95% | 13.11% |
Table 2. Result of Student ID recognition on 122 images
For evaluating the performance of the system in bad condition
Unconstrained:
Figure 9. Student Index Recognition on 100 Unconstrained set
Figure 10. Student Score Recognition on 100 Unconstrained set
For evaluating the performance of the system without the error from the user
Constrained:
Figure 11. Student Index Recognition on 103 Constrained set
Figure 12. Student Score Recognition on 103 Constrained set
Unlike recognition on each image which is quite easy to predict the right index (name & student ID) thanks to the Lexicon Search, it's really hard to increase the accuracy of score recognition up to 90%. In the real environment where the system operate, images are continuously processed like video frames. Based on that idea, the score won't be updated unless the system recognizes the same information (name, student ID, score) 3 times in a row.
The final result of the system is 95.55% (43/45 images)
Figure 13. Flowchart diagram for video recognitiont