Closed mit456 closed 6 years ago
Hi,
Increasing the training text data to around 3000 images is a good idea. After that you'll have to plot the model loss on training and validation data to interpret whether adding more data will help or not. I used approx. 2000-3000 images for devanagari dataset.
In this particular case, it's the problem of illumination. Please see the attached image. it shows the original image, the image after preprocessing and the connected components of the image. You can see that after preprocessing the "Then one day" part is destroyed(side effect of thresholding), so it's impossible to extract it. Increasing the dataset size will solve the problem of extracting "everything changed" but you won't be able to extract the top line completely(might wanna play around with otsu_thresholding). This is one of the drawback of this approach as I didn't preprocess the image much to compensate for bad lighting. Can you check the performance on some other images and let me know how it's going? You can uncomment the subplots in cell 141 to plot the attached figure.
I hope this helps. let me know if you have other questions.
PS: If you;'re interested, there's another algorithm "Stroke Width Transform" that uses connected components to find the width of strokes to determine the alphabets. This approach is also language agnostic.You might want to read about it.
Thanks for your prompt reply, I tried to plot loss and accuracy graph for the training on your basic dataset, Please find attached graph
What's your take on loss and accuracy graph?
I ran on a couple of different images I found out that specifically, some characters are getting missed as per attached image you can notice it is missing L and l
, I and i
, U and u
, r
sometimes t
, c
, w and W
, p and P
.
And regarding SWT, I have read about, by SWT you mean replacing current preprocessing with SWT? and then run the model on selected contours?
It seems to be working much better on this image.
Loss vs epoch plot looks fine (although you might be able to run it for a bit longer to get slightly better accuracy). Since we can overfit the model on this dataset, this means that we don't have a biased model but the variance is high. This means increasing the training data size will definitely improve the performance of the model. The way to generate a new dataset is to download few images(20-50 or more), find the connected components of these images and manually put the alphabets in the text folder and the rest in the non-text regions. It took me about 2 hours to create this dataset so it's not very time-consuming.
Yes, by SWT I mean to apply CNN on the selected contours from SWT.
Try increasing the data size first. Maybe try to include the difficult alphabets (L,l i,I,U,u etc) more.
Hope this helps.
Also, you can use precision and/or recall on alphabet level to define a metric to quantify the results. Then use around 30-50 images to get a sense on how well the model is working.
I did pass connected components from swt through the model and here are the results.
Gaussian Adaptive thresholding
SWT
My training set was generated using cv2.connectedComponents
and adaptive thresholding. should I include swt connected components also to the training set to swt to perform better, one thing in case of SWT connected components and completely including the character?
Since gaussian adaptive thresholding seems to work so well I don't think you even need SWT. It's up to you now how you want to explore different approaches. I'd suggest creating a test set and calculating precision/recall to quantify the results. What's your training size now?
I'm closing this issue now as the problem seems to be solved by changing preprocessing step and increasing training size.
Hello @sjoshi4may,
Thanks for sharing the project, I was trying to run for the English dataset and was using data from
https://github.com/sjoshi4may/Text-detection-in-natural-scene-images-dataset
for training the CNN text/ non-text classifier, and trying to evaluate the result on custom dataset but it seems to be not performing well, I am thinking we increase the training data by addingmjsynth
orchar74k
dataset, will it a good idea?Or can it a problem in preprocessing where we are missing components? Please find attached image