ropensci / opencv

R bindings for OpenCV
https://docs.ropensci.org/opencv
Other
137 stars 27 forks source link

Blur detection using R opencv #19

Closed ghareesh closed 4 years ago

ghareesh commented 4 years ago

Hi Team.. I have a Tesseract based OCR implementation in R for scanned PDF documents..
There are some documents that are blurred where OCR fails.. I need to detect them beforehand so that I can try some additional deblur processing before passing to OCR..

Below link has an python implementation to detect blur using Laplacian function using opencv..

https://www.pyimagesearch.com/2015/09/07/blur-detection-with-opencv/

Function of interest is below def variance_of_laplacian(image):

compute the Laplacian of the image and then return the focus

# measure, which is simply the variance of the Laplacian
return cv2.Laplacian(image, cv2.CV_64F).var()

Can we get similar implemetation of Laplacian in ropencv also? Else is there any alternative?

jwijffels commented 4 years ago

You can use magick::image_convolve to apply the Laplacian kernel and next get the variance of the pixels.

ghareesh commented 4 years ago

Hi @jwijffels Can you confirm if the below is right implementation?

Define Laplacian kernel

kern <- matrix(c(0,1,0, 1,-4,1, 0,1,0),ncol = 3, nrow = 3)

txt_img1 - list of 2 documents one blurred(3 pages)) and other non blurred(4 pages)

Convert both documents to greyscale

t1<-lapply(txt_img1,image_convert,type = 'grayscale')

Convolve blurred document pages with kernel

img_conv1 <- image_convolve(t1[[1]], kern)

Convolve non-blurred document pages with kernel

img_conv2 <- image_convolve(t1[[2]], kern)

Loop through each page in 1st document and compute mean standard deviation per page

a<-NULL for (i in 1:length(t1[[1]])) { a<-rbind(a,mean(sd(drop(as.integer(img_conv1[[i]]))))) } I get output something like this for each page of document for blurred document a: 22.0108301000 20.2005934287 22.3654526666

sum(a)/length(a) 21.5256253984

Same done for non blurred which has 4 pages, output is as below a1: 26.9204131392 40.3789356727 18.2833013033 21.1756138231 sum(a1)/length(a1) [1] 26.6895659846

I dont notice much of difference in values though.. Can you confirm please?

jwijffels commented 4 years ago

looks like you are computing the mean of the standard deviation of the laplacian of a grey-scaled image

ghareesh commented 4 years ago

Yes.. I am not sure though.. I just need a final metric that can say if page is blurred.. Above metric displays low variance where page is half filled as well.. Is there a more standard way for doing same?You mentioned variance of pixels.. What does that mean?

jwijffels commented 4 years ago

It's what you do, you are getting the standard deviation of the pixel values which are in the 0-255 range.

ghareesh commented 4 years ago

ok.. in that case I will try and finetune this little more.. Thanks a lot for helping promptly!!!