jscancella / NYTribuneOCRExperiments

experiments trying to generate better OCR for the NY tribune newspapers from Chronam
GNU General Public License v3.0
12 stars 1 forks source link

article spanning multiple columns #2

Closed Temitope-Emmanuel closed 9 months ago

Temitope-Emmanuel commented 9 months ago

I'm trying to use the find_Text_usingSums workflow, but I have a problem that I have articles spanning multiple columns, how do I go about this, the current iteration is not working PM News October_28_1994_ Pg 2 resize

jscancella commented 9 months ago

This was just an experiment to prove it could work, and not a production ready solution. This example image would not be a good candidate since it is skewed and the find columns by sums assumes that there is no skew.

You would be better off training a YOLO model. I did experiment with it (https://github.com/jscancella/chronam_with_yolo/blob/master/Chronam%20column%20detection%20with%20YOLOv5.ipynb) but it is not a step by step walkthrough.

Temitope-Emmanuel commented 9 months ago

Okay @jscancella , thank you, I will check it out!!! P.S. sorry for the late reply.