I am supposed to write a code to read in text from images using R. I am using the Tesseract and Magick packages for doing the same and am facing an issue where the code converts an "&" to "8:" I have attached the image that I am using as an input.
Below is the code that I am running:-
_test2 <- image_read("C:/Users/admin/Desktop/testimage.jpg") %>% image_resize("2000") %>% image_convert(colorspace = 'gray') %>% image_trim() %>% image_ocr() cat(test2) write.table(test2, "C:/Users/admin/Desktop/output2.txt", sep="\t")_
I have ALSO tried to modify it and try the below, but still the result is the same:-
_wl = paste(paste(letters, LETTERS, collapse="", sep=""), "0123456789&;") engine <- tesseract(options = list(tessedit_char_whitelist = wl), cache=FALSE) test3 <- image_read("C:/Users/admin/Desktop/testimage.jpg") %>% image_resize("500") %>% image_convert(colorspace = 'gray') %>% image_trim() %>% image_ocr() engine <- tesseract(options = list(tessedit_char_whitelist = ";&")) cat(test3)_
Below is the output that I am getting:-
No relation between boycotting
panchayat polls 8: Article 35A:
Subramanian Swamy
I have gone through this website and have also posted same question on Stackoverflow but it has been several hours and did not get any solution for the same.
I am supposed to write a code to read in text from images using R. I am using the Tesseract and Magick packages for doing the same and am facing an issue where the code converts an "&" to "8:" I have attached the image that I am using as an input. Below is the code that I am running:- _
test2 <- image_read("C:/Users/admin/Desktop/testimage.jpg") %>% image_resize("2000") %>% image_convert(colorspace = 'gray') %>% image_trim() %>% image_ocr() cat(test2) write.table(test2, "C:/Users/admin/Desktop/output2.txt", sep="\t")
_I have ALSO tried to modify it and try the below, but still the result is the same:- _
wl = paste(paste(letters, LETTERS, collapse="", sep=""), "0123456789&;") engine <- tesseract(options = list(tessedit_char_whitelist = wl), cache=FALSE) test3 <- image_read("C:/Users/admin/Desktop/testimage.jpg") %>% image_resize("500") %>% image_convert(colorspace = 'gray') %>% image_trim() %>% image_ocr() engine <- tesseract(options = list(tessedit_char_whitelist = ";&")) cat(test3)
_Below is the output that I am getting:- No relation between boycotting panchayat polls 8: Article 35A: Subramanian Swamy
I have gone through this website and have also posted same question on Stackoverflow but it has been several hours and did not get any solution for the same.
If someone can help, that will be a great help.