supermemo / SuperMemoAssistant.Plugins.LateX

https://www.supermemo.wiki/sma/
MIT License
2 stars 3 forks source link

Issue with latex equations start and end symbols registering in OCR #12

Open nikedinikedi opened 4 years ago

nikedinikedi commented 4 years ago

New description

When OCRing both Text and Formulas (math, chemistry, ...), add a feature to automatically convert LaTeX Math embeds \[ and \] as SMA tags [$] and [/$].

See below for illustration.

Old description

When OCR'ing chunk of certain layout with several latex symbols or equations, the equation isolated/seperated/in between two paragraphs will miss [$] and [/$]

image


This is what i OCR'd (the whole thing)

image


This has happened with multiple similiar layouts when i OCR such chunk. Note: if i OCR this equation individually, it works fine.

Here's the page of this PDF (the bug occurs with the equation in bottom)

apstats2.pdf

nikedinikedi commented 4 years ago

Similiar case with little different settings, but the main idea is same, when there's line break this bug happens:

Image 1. What i OCR'd image.png

Image 2. Tex Editor

image.png

alexis- commented 4 years ago

Doesn't seem to happen anymore

image.png

nikedinikedi commented 4 years ago

Try on longer, similar pieces as in the image

nikedinikedi commented 4 years ago

This is the main reason i don't use latex that often -Naess

alexis- commented 4 years ago

I'll try the second PDF, but the test I shared was executed on the formula of the first PDF that you shared.

alexis- commented 4 years ago

Also we're running different versions of the PDF. I don't recall exactly all the changes, but I think I changed something in the OCR a while back.

alexis- commented 4 years ago

Ah wait, I see what you're doing here. You're OCRing everything, not just the formula. It's not meant to work like that. MathPix returns the LaTeX code for the whole image, meaning it thinks you also want the text to be part of the LaTeX document.

nikedinikedi commented 4 years ago

You need to include text not only the equation

nikedinikedi commented 4 years ago

Yes

nikedinikedi commented 4 years ago

It's inefficient to go one equation at once esp in math heavy papers jumping between words and equations. It works fine usually but not when there is a break like in the images

alexis- commented 4 years ago

It's inefficient to go one equation at once esp in math heavy papers jumping between words and equations

I agree. It's not a bug though, it's a feature request as current functionality is working as expected.

alexis- commented 4 years ago

Done, will be in next version. Look out for any bugs.

nikedinikedi commented 4 years ago

BEAUTIFUL