mrzaizai2k / Automated-scoring-of-handwritten-test-papers

Extract handwritten information like name, student ID and then recognize them with CRNN-CTC-Attention. Using lexicon search on class list to help teacher on updating score faster
MIT License
15 stars 3 forks source link
attention attention-mechanism cerevalution crnn-ctc data-augmentation deep-learning east image-processing image-segmentation keras lexicon mnist multi-digit ocr python recognition

Automated-scoring-of-handwritten-test-papers

Check out my 2 YOUTUBE channels for more:

  1. Mrzaizai2k - AI (NEW)
  2. Mrzaizai2k (old)

For Vietnamese student, you can read the LVTN_Mai Chí Bảo_1710586.pdf and LVTN_Mai Chí Bảo_1710586.pptx which is written in Vietnamese. For foreign reader, I'll update the English version ASAP

Table of contents

1. Introduction

This project is a part of my thesis. In short, a secondary school teacher spend too much time on updating score manually (around 4000 test papers/year according to this news )

The goal of the thesis is to assist teachers in automatically updating results in Excel after marking their students' tests. However, in this project, we just collect data such as names and student IDs and recognize them in order to prepare for the next stage.

Figure 1. Test paper of Ho Chi Minh University of Technology

As you can see, I use my university's test paper. My name is Mai Chi Bao and my student ID (MSSV) is 1710586. Those are handwritten information that I'd want to cut out. Of course, there's the score. But we'll talk about it at another repository later.

2. User

The workflow of this system is like this

  1. Image is continuously captured by the phone's camera which already have this App called IP Camera
  2. Computer will take Image from the Web Server of the IP Camera App to process
  3. Automatically update the score into Excel like this data/Class_list.xlsx

The GUI:

Figure 2. GUI

3. Dataset

Those are raw data, and without Data Augmentation, they won't assist at all.

I applied them all in source/prepare_MSSV_dataset.py and source/imgtocsv.py for both name and student ID training. I found that those methods are not enough, so the solution is to collect more real data. I added about 220 photos for each, and with data augmentation, I was able to increase it to 20000 images, with good results.

4. Image Preprocessing

You can find code in source/Preprocessing.py The flow of this stage is:

  1. Image Alignment
  2. Maximize Contrast
  3. Otsu Threshold
  4. Remove line/circle

We take the background information with the input image, and the picture is not in the proper direction, making it difficult to extract and recognize. The process is significantly easier with the aid of Image Alignment.

Figure 3. Image Alignment

Reference: https://www.pyimagesearch.com/2020/08/31/image-alignment-and-registration-with-opencv/

Then I crop images I need with fixed pixels at all times

Figure 4. MSSV_crop.jpg

I used contrast maximization with Top hat and Black hat method. I found this can hold back lots of necessary information after Otsu Threshold, especially with blur images.

I did compare between Adaptive Threshold and Otsu Theshold. Adaptive Threshold which we know that works really well with variations in lighting conditions, shadowing... You can visit this site to know more. However, noise is retained. Even with the Gaussian Blur step. It's like a lot of noise which is hard to apply remove line and recognize step successfully. Otsu turns out performing so well, I guess that because the small size of image after cropping reduces the effect of light variance.

Figure 5. Image after removing line

5. Word segmentation

I have compared between EAST and Scale Space techniques. You can see the result of EAST

Figure 6. EAST result

Pretty good hah! But these drawbacks made me decide to choose Scale Space technique:

Scale Space technique:

Figure 7. Word segmentation

Reference: https://www.researchgate.net/publication/2375892_Scale_Space_Technique_for_Word_Segmentation_in_Handwritten_Manuscripts

6. Model

My model here is CRNN + Attention + CTC Loss

Figure 8. Model Structure

I will briefly describe the model. You can easily find papers about this model cause it's too famous

I also used Batch Normalization, Drop out,... The model structure is in source/word_model.py. Both name and studen ID Recognition share the same model

Reference:

7. Training

I have created 2 Kaggle Notebooks for training name and student ID (MSSV). I had carefully explained stages there so I won't try writing all of them again here.

I also used Early Stopping, Learning rate changing to increase the performance

You can find the .h5 model in my google drive because GitHub don't allow me to upload file bigger than 25Mb

8. Result

Because of the variations in the word and number datasets, I had to change the way I trained the model, set up parameters, and assessed it. For name recognition, I'll focus on the strategies I employed throughout the training phase. Due to the lack of real data, I'd evaluate number recognition based on how I built the dataset.

8.1 Result on 122 images set

For evaluating the impact of changes on Image processing and model structure

The first two tables show the results of 122 test papers with only my name and MSSV on them (with a wide range of light, camera angle and distance, picture resolution...) And utilize them to identify my index in my 245-student class list.

Name Recognition CRNN + CTC + Data Augmentation + Learning Rate changing + Attention + Lexicon search
CER 35.25% 23.40% 16.77% 16.24% 0.45%
WER 74.59% 69.40% 45.63% 47.27% 0.55%

Table 1. Result of Name recognition on 122 images

MSSV Recognition 7-digit number and blob noise + Combination of 1,3,4,5,7-digit number + Rotate, scale, Random Cutout, Line Noise on digit images + Changing the distance between digits. Scale, Rotate on multi-digit images + Adding real data + Lexicon search
CER 63.82% 48.24% 45.55% 13.58% 3.63% 2.58%
WER 100.00% 100.00% 100.00% 63.11% 22.95% 13.11%

Table 2. Result of Student ID recognition on 122 images

8.2 Result on 100 Unconstrained Images

For evaluating the performance of the system in bad condition

Unconstrained:

Figure 9. Student Index Recognition on 100 Unconstrained set

Figure 10. Student Score Recognition on 100 Unconstrained set

8.3 Result on 103 Constrained Images

For evaluating the performance of the system without the error from the user

Constrained:

Figure 11. Student Index Recognition on 103 Constrained set

Figure 12. Student Score Recognition on 103 Constrained set

8.4 Result on video

Unlike recognition on each image which is quite easy to predict the right index (name & student ID) thanks to the Lexicon Search, it's really hard to increase the accuracy of score recognition up to 90%. In the real environment where the system operate, images are continuously processed like video frames. Based on that idea, the score won't be updated unless the system recognizes the same information (name, student ID, score) 3 times in a row.

The final result of the system is 95.55% (43/45 images)

Figure 13. Flowchart diagram for video recognitiont

9. Conclusion