tberg12 / ocular

Ocular is a state-of-the-art historical OCR system.
GNU General Public License v3.0
250 stars 48 forks source link

Out of bounds error #9

Open jwang281 opened 6 years ago

jwang281 commented 6 years ago

Hi, we are getting this exception after running for a couple of days -- any idea why?

Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
        at edu.berkeley.cs.nlp.ocular.preprocessing.VerticalModel.minSize(VerticalModel.java:202)
        at edu.berkeley.cs.nlp.ocular.preprocessing.VerticalProfile.decode(VerticalProfile.java:171)
        at edu.berkeley.cs.nlp.ocular.preprocessing.LineExtractor.extractLines(LineExtractor.java:23)
        at edu.berkeley.cs.nlp.ocular.data.LazyRawImageDocument.doLoadObservationsFromFile(LazyRawImageDocument.java:86)
        at edu.berkeley.cs.nlp.ocular.data.LazyRawImageDocument.loadLineImages(LazyRawImageDocument.java:64)
        at edu.berkeley.cs.nlp.ocular.model.DecoderEM.computeEStep(DecoderEM.java:64)
        at edu.berkeley.cs.nlp.ocular.train.FontTrainer.doFontTrainPass(FontTrainer.java:185)
        at edu.berkeley.cs.nlp.ocular.train.FontTrainer.trainFont(FontTrainer.java:95)
        at edu.berkeley.cs.nlp.ocular.main.TrainFont.run(TrainFont.java:76)
        at edu.berkeley.cs.nlp.ocular.main.OcularRunnable.doMain(OcularRunnable.java:25)
        at edu.berkeley.cs.nlp.ocular.main.TrainFont.main(TrainFont.java:41)
halperta commented 6 years ago

Was it trying to decode a blank page? Sometimes that causes the error.

On Wed, Feb 21, 2018 at 9:38 AM, jwang281 notifications@github.com wrote:

Hi, we are getting this exception after running for a couple of days -- any idea why?

Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 at edu.berkeley.cs.nlp.ocular.preprocessing.VerticalModel.minSize(VerticalModel.java:202) at edu.berkeley.cs.nlp.ocular.preprocessing.VerticalProfile.decode(VerticalProfile.java:171) at edu.berkeley.cs.nlp.ocular.preprocessing.LineExtractor.extractLines(LineExtractor.java:23) at edu.berkeley.cs.nlp.ocular.data.LazyRawImageDocument.doLoadObservationsFromFile(LazyRawImageDocument.java:86) at edu.berkeley.cs.nlp.ocular.data.LazyRawImageDocument.loadLineImages(LazyRawImageDocument.java:64) at edu.berkeley.cs.nlp.ocular.model.DecoderEM.computeEStep(DecoderEM.java:64) at edu.berkeley.cs.nlp.ocular.train.FontTrainer.doFontTrainPass(FontTrainer.java:185) at edu.berkeley.cs.nlp.ocular.train.FontTrainer.trainFont(FontTrainer.java:95) at edu.berkeley.cs.nlp.ocular.main.TrainFont.run(TrainFont.java:76) at edu.berkeley.cs.nlp.ocular.main.OcularRunnable.doMain(OcularRunnable.java:25) at edu.berkeley.cs.nlp.ocular.main.TrainFont.main(TrainFont.java:41)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tberg12/ocular/issues/9, or mute the thread https://github.com/notifications/unsubscribe-auth/ADxgN7p6KBonZZ9yfbN11gpaXvBRYRqUks5tXDiCgaJpZM4SN2I6 .

jwang281 commented 6 years ago

The page is not blank but I did get the output that says "no evaluation diplomatic text found at the image".

halperta commented 6 years ago

The other thing that has caused that error for me is when the image is very low quality. But if it's not blank pages or image quality, I don't know what it would be! I'm just a user, not a developer.

On Sun, Feb 25, 2018 at 3:12 PM, jwang281 notifications@github.com wrote:

The page is not blank but I did get the output that says "no evaluation diplomatic text found at the image".

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tberg12/ocular/issues/9#issuecomment-368345129, or mute the thread https://github.com/notifications/unsubscribe-auth/ADxgN8ByacglrCSBpqJuY_kJ0Lntuppwks5tYczTgaJpZM4SN2I6 .

jwang281 commented 6 years ago

Thanks anyway!

davidweichiang commented 6 years ago

Hi @tberg12, it would be great if you know how to fix this quickly....we can look at it, but it will probably take us a lot longer...thanks!

davidweichiang commented 6 years ago

I attempted a fix: ndnlp/ocular@00f8db0. @jwang281, can you test it on the file that was causing the problem?

tberg12 commented 6 years ago

Sorry for the lag, I'm currently traveling. I'll take a look soon. Hopefully David's fix worked. Also, let me know if any of you want to be added as contributors. The more the merrier!

On Fri, Mar 2, 2018 at 6:52 AM, David Chiang notifications@github.com wrote:

I attempted a fix: ndnlp/ocular@00f8db0 https://github.com/ndnlp/ocular/commit/00f8db0. @jwang281 https://github.com/jwang281, can you test it on the file that was causing the problem?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tberg12/ocular/issues/9#issuecomment-369941634, or mute the thread https://github.com/notifications/unsubscribe-auth/AF-8FH14_L2FHVP5WD1jUJKB3eFpBB91ks5taVymgaJpZM4SN2I6 .

davidweichiang commented 6 years ago

It fixed the symptom but perhaps not the problem. I was expecting it to skip those pages and then on later iterations to not skip them. But it sounds like it just keeps skipping them on every iteration. Do you have any idea why the vertical segmentation would fail?

tberg12 commented 6 years ago

The vertical segmentation system is an unsupervised semi-markov model. We run a bunch of random restarts and the top scoring one is a good segmentation most of the time... but sometimes for no apparent reason a page will cause trouble. We’ve got a better segmentation method in the pipeline, but it’s not in ocular yet.

On Fri, Mar 2, 2018 at 1:32 PM David Chiang notifications@github.com wrote:

It fixed the symptom but perhaps not the problem. I was expecting it to skip those pages and then on later iterations to not skip them. But it sounds like it just keeps skipping them on every iteration. Do you have any idea why the vertical segmentation would fail?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/tberg12/ocular/issues/9#issuecomment-370058785, or mute the thread https://github.com/notifications/unsubscribe-auth/AF-8FMi6QekieWEST3VVFM0eGvLHrRtyks5tabp7gaJpZM4SN2I6 .