Closed lethefrost closed 1 year ago
Hi, thanks for posting this issue with well explanation, after checking and reproduce the cases you provided, I already know why this happened, I will try to fix this. The reason why chatgpt-exporter won't have this issue is because it will fetch the raw content from API directly, rather than parse the HTML content which polluted by other renderers.
Temporary fixed in 0.5.8.
We reversed the HTML em
tag back to _
only, so it will lead to another issue:
Because the Markdown render won't let us be possible to know the original symbol used to render the em
tag is either _
or *
, so for the formula ${x}^{*} = {y}^{*}$
, the script won't be able to typeset it correctly.
Will try to fix this also.
Hi, thanks for posting this issue with well explanation, after checking and reproduce the cases you provided, I already know why this happened, I will try to fix this. The reason why chatgpt-exporter won't have this issue is because it will fetch the raw content from API directly, rather than parse the HTML content which polluted by other renderers.
Hi scruel, thank you!! God. You are really amazing! How fast you are identifying the issue. I don't understand why at all, and wonder if you would mind taking some time to explain to me why ChatGPT and GitHub are both having issues like this? Really appreciate your quick response and fix. It's so impressive!
Also, thank you for also recognizing the *
problem for the temporary solution! I am thinking of some unique syntax for _
might help distinguish the cases of *
or _
. For example, if the <em>
tags matches the following cases, they cannot be _
originally (and i.e. must be *
), because of the syntactic rules of $\LaTeX$,
$<em>
or <em>$
- immediately next to the $
boundaries at either the beginning or end of the inline formula[_^][{}]*<em>
or <em>[{}]*[_^]
- following or followed by either another _
or ^
, with only (potentially) {
or }
in between_
?<em>
or </em>
is determined, the nearest one paired with it can be determined to be the same character.Thank you again for your work! Really appreciate it. Hope I am helping 😊. I just found a new bug with the 0.5.8 release and I will raise a new issue for this.
@lethefrost For GitHub, I can't be sure the reason why caused this. The problem here for ChatGPT page, is caused by the wrong rendering order, like you said before, we should first typeset LaTeX formulas, then render Markdown formats, but as a script (without injecting), it can only do it at the end. Your provided cases are helpful, I will consider them while I am fixing this. Currently, I can fetch the raw content for matching to confirm the original symbol, however, I think I will have to reverse some parts of HTML back to Markdown to do the match, and this "reverse" processing also will cause some problems, so for fixing this, I will need some time.
Fixed in 0.6.0.
On ChatGPT's web interface, if such formulas appear in the form of
{x}_a
andy_{b}
(wherex, y, a, b
can be any elements, they can be either the same or different, the key point is that there are two_
in the same paragraph, and, the one character just immediately before the former_
and the one after the latter_
, are symbols instead of numbers/letters. The same goes for*
), the_
s will be first consumed as the italic formatting character of Markdown, causing unwanted rendering results.I made the following table to demonstrate some test cases (I suddenly found that GitHub also has this problem? I am now confused about whether this is a bug or a feature... Are they using the same rendering module? I just tried some other Markdown toke taking apps such as Obsidian, Logseq, and so on, and pandoc for exporting Markdown to PDF, and they all render correctly... For now I only found GitHub and ChatGPT have this issue.)
$x_a$ and $y_b$
${x}_a$ and $y_{b}$
${x}_{a}$ and ${y}_{b}$
_
and after the second_
are symbols, it will cause this bug, and it doesn't matter whether it is a symbol elsewhere${x}_a = y_{b}$
$x^@_i = y_@$
{}
, as long as the adjacent characters are not a letter/number, it will cause this bug$x^2_i = y_2$
${x}^{*} = {y}^{*}$
*
and becoming italic$\mathbf{w}_t$ and $\mathbf{w}_{t+1}$
Screenshots:
Do you have any idea what this bug or feature is caused by? Do GitHub and ChatGPT share the same module rendering Markdown? But why wasn't ChatGPT originally able to render the inline formulas? I am getting really confused here.
I am wondering would it be possible for you to catch the
_
characters in the source text and render it by MathJax before they are consumed by the Markdown renderer. Looks like the source text of GPT generated response can be obtained by some means. For example, the repo chatgpt-exporter does so.Hope we can figure this out. Thank you so much! I greatly appreciate your work, and it indeed helped me a lot. ❤️