morganrivers / problem_set_ocr

OCR using gpt4 to go from handwritten to latex psets
GNU Affero General Public License v3.0
1 stars 0 forks source link

Hello, we have the mathpix API interface, but we lack sufficient handwriting data. Perhaps we can collaborate #1

Open YRQ66 opened 6 months ago

YRQ66 commented 6 months ago

Hello, we have the mathpix API interface, but we lack sufficient handwriting data. Perhaps we can collaborate

morganrivers commented 6 months ago

Hi, well it seems you found my little problem set uploader. I worry if you are trying to beat GPT4 at handwriting recognition, you will not succeed, as GPT4 can actually understand the math being used and intuit illegible characters.

So, I'm not interested in an API interface, but I would love if you were to make your code more open source or at least offer a free version to provide more benefit to the scientific community?

As for handwritten latex, at the end of the semester I can provide you with about 100 handwritten pages of my problem sets and associated high quality latex conversions. I doubt this is of much interest of you as it's a pretty small dataset and it's just in my handwriting. I don't think anyone else is using this github repo. signal-2024-04-19-001309_005

That's a handwritten example for which I have properly latex'd solution.

Latex:

\section*{Exercise 4}
\begin{enumerate}

    \item[(i)] I had to look at Wikihow for converting to polar and the square root idea, but did the rest myself. 

    \[
    \int_{-\infty}^\infty e^{-ax^2} \, dx = \left(\int_{-\infty}^\infty e^{-ay^2} \, dy\right) \left(\int_{-\infty}^\infty e^{-az^2} \, dz\right)
    \]

    Note $y$ and $z$ are independent.

    \[
    = \sqrt{\int \int_{-\infty}^\infty e^{-a(y^2+z^2)} \, dy \, dz}
    \]

    The integral is over all area.
    Set $y = r\cos\theta$, $z = r\sin\theta$, and $dA = dydz = rdrd\theta$.

    Now, in polar coordinates
    \[
    = \sqrt{\int_0^{2\pi} \int_0^\infty e^{-ar^2} r \, dr \, d\theta} = \sqrt{2\pi} \sqrt{\int_0^\infty e^{-ar^2} r \, dr}
    \]

    Then I remembered, the derivative
    \[
    \frac{d}{dr} e^{-ar^2} = -2ar \, e^{-ar^2}
    \]
    Let $p = ar^2$,
    \[
    \frac{d}{dp} e^{-p} = \frac{d}{dr}e^{-ar^2} = e^{-ar^2}(-2ar) \rightarrow \frac{d}{dr} \left(\frac{-1}{2a}e^{-ar^2}\right) = e^{-ar^2}r
    \]

    Which means
    \[
    \int_0^\infty e^{-ar^2} r \, dr = -\frac{1}{2a} \left[e^{-ar^2}\right]_0^\infty = \frac{1}{2a}
    \]

    \boxed{
    \int_{-\infty}^\infty e^{-ax^2} \, dx = \sqrt{\frac{\pi}{a}}
    }
morganrivers commented 6 months ago

image

morganrivers commented 6 months ago

yeah so GPT4 did that all nearly correctly, but I had to fix some small things. It's quite impressive. Here was gpt4 vision first try

Differences: GPT4 didn't read the text perfectly, GPT didn't understand the big square root (fair enough, it's a bit nonstandard notation), and it missed the square root of pi at the end

image

YRQ66 commented 5 months ago

Perhaps GPT can perform well in related recognition tasks in the near future, but I still believe that training a formula recognition model is meaningful. In the future, I will open up more datasets, which is certain.Thank you for your work and for providing us with a new solution

morganrivers commented 5 months ago

Of course. GPT is computationally expensive and energy intensive. Non-GPT solutions are great. They can be combined for more reliable results as well.

It's good you will open up your datasets. Well, at the end of the semester (early July), I now plan to send you my homeworks and the latex versions for each whole problem set.

Actually, after taking a look at mathpix, could you give me the API access, if I send you the handwritten pages? I can use the mathpix output as more context in the prompt to gpt4, both tools can complement each other I think.

Gpt can give a nicely formatted coherent mathematical document, but it does badly at capturing all the parts of the image and rendering individual equations (scales badly to lots of math on a page). From my one try at Mathpix, mathpix renders most everything on the page, but it doesn't format it well and doesn't write coherent English when doing OCR.

It will be interesting to explore the combination.

On Tue, May 7, 2024, 16:41 LinkW @.***> wrote:

Perhaps GPT can perform well in related recognition tasks in the near future, but I still believe that training a formula recognition model is meaningful. In the future, I will open up more datasets, which is certain

— Reply to this email directly, view it on GitHub https://github.com/morganrivers/problem_set_ocr/issues/1#issuecomment-2098571003, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARRD76MDX63D6ZCTMYFKNRDZBDR2DAVCNFSM6AAAAABHFKCH5GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJYGU3TCMBQGM . You are receiving this because you commented.Message ID: @.***>