kerrickstaley / genanki

A Python 3 library for generating Anki decks
MIT License
2.06k stars 161 forks source link

Complicated LaTeX gets messed up when importing #28

Closed ghost closed 4 years ago

ghost commented 5 years ago

On Arch Linux, Python 3.7.2

If I create a apkg file with this script:

import genanki

my_deck = genanki.Deck(1391554143, "my_deck")

my_model = genanki.Model(
    1836072805,
    "my_model",
    fields=[{"name": "Question"}, {"name": "Answer"}],
    templates=[{"name": "Card 1", "qfmt": "{{Question}}", "afmt": "{{Answer}}"}],
)

rxn0 = r"""[latex]
\schemestart
\chemfig{*6(--(<OH)-(<:Br)---)}
\arrow{->[?]}
\chemfig{*6(--(<[:30]{O}?)(<:H)-?[,{>},](<:H)---)}
\schemestop[/latex]"""

reagents0 = r"""\(\ce{NaOH, H_2O_2}\)"""

rxn1 = r"""[latex]
\schemestart
\chemfig{*6(--=---)}
\arrow{->[?]}
\chemfig{*6(--(<[:30]{O}?)(<:H)-?[,{>},](<:H)---)}
\schemestop[/latex]
"""

reagents1 = r"""\(\ce{mCPBA, CH_2Cl_2}\)"""

rxn2 = r"""[latex]
\schemestart
\chemfig{C(-[:0]
    C(-[:90]X)(-[:0])(-[:270])
)(-[:90])(-[:180])(-[:270]H)}
\arrow{->[?]}
\chemfig{C(-[:120])(-[:-120])=C(-[:60])(-[:-60])} \+ \ce{HX}
\schemestop[/latex]
"""

reagents2 = r"""\(\ce{Na^{+}{}^{-}OCH_2CH_3, EtOH, 70°C}\)"""

my_deck.add_note(genanki.Note(model=my_model, fields=[rxn0, reagents0]))
my_deck.add_note(genanki.Note(model=my_model, fields=[rxn1, reagents1]))
my_deck.add_note(genanki.Note(model=my_model, fields=[rxn2, reagents2]))

genanki.Package(my_deck).write_to_file("output.apkg")

results in an anki popup window that says:

Notes found in file: 3
Notes added from file: 3

[Added] [latex]
\schemestart
\chemfig{*6(--(<OH)-(<:Br)---)}
\arrow{->[?]}
\chemfig{*6(--(<[:30]{O}?)(<:H)-?[,{>},](<:H)---)}
\schemestop[/latex], \(\ce{NaOH, H_2O_2}\)
[Added] [latex]
\schemestart
\chemfig{*6(--=---)}
\arrow{->[?]}
\chemfig{*6(--(<[:30]{O}?)(<:H)-?[,{>},](<:H)---)}
\schemestop[/latex]
, \(\ce{mCPBA, CH_2Cl_2}\)
[Added] [latex]
\schemestart
\chemfig{C(-[:0]
    C(-[:90]X)(-[:0])(-[:270])
)(-[:90])(-[:180])(-[:270]H)}
\arrow{->[?]}
\chemfig{C(-[:120])(-[:-120])=C(-[:60])(-[:-60])} \+ \ce{HX}
\schemestop[/latex]
, \(\ce{Na^{+}{}^{-}OCH_2CH_3, EtOH, 70°C}\)

which looks fine.

And I set the appropriate latex header and enable dvisvgm:

\documentclass[varwidth=100cm]{standalone}
\usepackage[version=4]{mhchem}
\usepackage{chemfig}
\being{document}

However, when studying or browsing, for some reason the latex output of rxn0 and rxn1 gets messed up:

rxn0

[latex] \schemestart \chemfig{*6(--([?]} \chemfig{*6(--(<[:30]{O}?)(<:H)-?[,{>},](<:H)---)} \schemestop[/latex]

/tmp/anki_temp/tmp.tex:

\documentclass[varwidth=100cm]{standalone}
\usepackage[version=4]{mhchem}
\usepackage{chemfig}
\being{document}

\schemestart
\chemfig{*6(--([?]}
\chemfig{*6(--(<[:30]{O}?)(<:H)-?[,{>},](<:H)---)}
\schemestop
\end{document}

when it should be:

[latex]
{\Large Halohydrine $\to$ Epoxide}\\[12pt]
\schemestart
\chemfig{*6(--(<OH)-(<:Br)---)}
\arrow{->[?]}
\chemfig{*6(--(<[:30]{O}?)(<:H)-?[,{>},](<:H)---)}
\schemestop[/latex]

/tmp/anki_temp/tmp.tex:

\documentclass[varwidth=100cm]{standalone}
\usepackage[version=4]{mhchem}
\usepackage{chemfig}
\being{document}

\schemestart
\chemfig{*6(--(<OH)-(<:Br)---)}
\arrow{->[?]}
\chemfig{*6(--(<[:30]{O}?)(<:H)-?[,{>},](<:H)---)}
\schemestop
\end{document}

rxn1

[latex] \schemestart \chemfig{*6(--=---)} \arrow{->[?]} \chemfig{*6(--(<[:30]{O}?)(<:H)-?[,{>},](<:H)---)} \schemestop[/latex]

/tmp/anki_temp/tmp.tex:

\documentclass[varwidth=100cm]{standalone}
\usepackage[version=4]{mhchem}
\usepackage{chemfig}
\begin{document}

\schemestart
\chemfig{*6(--=---)}
\arrow{->[?]}
\chemfig{*6(--(},](<:H)---)}
\schemestop
\end{document}

instead of

[latex]
\schemestart
\chemfig{*6(--=---)}
\arrow{->[?]}
\chemfig{*6(--(<[:30]{O}?)(<:H)-?[,{>},](<:H)---)}
\schemestop[/latex]

/tmp/anki_temp/tmp.tex:

\documentclass[varwidth=100cm]{standalone}
\usepackage[version=4]{mhchem}
\usepackage{chemfig}
\begin{document}

\schemestart
\chemfig{*6(--=---)}
\arrow{->[?]}
\chemfig{*6(--(<[:30]{O}?)(<:H)-?[,{>},](<:H)---)}
\schemestop
\end{document}

So basically,

\chemfig{*6(--(<OH)-(<:Br)---)}
\arrow{->[?]}

becomes

\chemfig{*6(--([?]}

and

\chemfig{*6(--(<[:30]{O}?)(<:H)-?[,{>},](<:H)---)}

becomes

\chemfig{*6(--(},](<:H)---)}

What's stranger is that I tested the python script several times and one time rxn1 did not have the issue. rxn2 has never been affected by this quirk. For some reason, chemfigs of epoxides get messed up by anki when importing a apkg generated by genanki.

kerrickstaley commented 4 years ago

When Anki encodes rxn0, it stores the LaTeX source as the following in the .apkg file:

[latex] \schemestart \chemfig{*6(--(&lt;OH)-(&lt;:Br)---)} \arrow{-&gt;[?]} \chemfig{*6(--(&lt;[:30]{O}?)(&lt;:H)-?[,{&gt;},](&lt;:H)---)} \schemestop[/latex]

As you can see, < and > are HTML-encoded as &lt; and &gt;. This seems strange because this code does not represent HTML, but that is apparently what Anki does.

Meanwhile, genanki does not do any such encoding; the data in the .apkg is

[latex]
\schemestart
\chemfig{*6(--(<OH)-(<:Br)---)}
\arrow{->[?]}
\chemfig{*6(--(<[:30]{O}?)(<:H)-?[,{>},](<:H)---)}
\schemestop[/latex]

(it also keeps newlines).

So, it looks like genanki needs to do some encoding of < and > symbols inside field data. I will look further into this and create a test / patch.

kerrickstaley commented 4 years ago

Anki HTML-escapes <, >, and & in the user input, because it allows the user to input arbitrary HTML for the field data (using a rich text editor).

We actually cannot automatically HTML-escape these characters, because there may be genanki users that want to include HTML in field data.

Instead, what we can do here is (1) add documentation stating that field data must be HTML and literal <>& must be escaped using the Python html.escape function (even for LaTeX), and (2) add a warning that triggers if a field contains <...> but it doesn't look like a HTML tag (the ... doesn't match the regex ^/?[a-z]+( |/?$)).

I will implement both of these changes.

kerrickstaley commented 4 years ago

After https://github.com/kerrickstaley/genanki/commit/104ea564217ecdaaccf6c127b9b2b81f66bb8d2f, when I run your code, I get the following warnings:

/home/kerrick/src/genanki/genanki/note.py:143: UserWarning: Field contained the following invalid HTML tags. Make sure you are calling html.escape() if your field data isn't already HTML-encoded: <OH)-(<:Br)---)}
\arrow{-> <[:30]{O}?)(<:H)-?[,{>
  warnings.warn("Field contained the following invalid HTML tags. Make sure you are calling html.escape() if"
/home/kerrick/src/genanki/genanki/note.py:143: UserWarning: Field contained the following invalid HTML tags. Make sure you are calling html.escape() if your field data isn't already HTML-encoded: <[:30]{O}?)(<:H)-?[,{>
  warnings.warn("Field contained the following invalid HTML tags. Make sure you are calling html.escape() if"

I think this is adequate to help the user diagnose/resolve the sorts of issue you mentioned. Thanks for the detailed bug report!

kerrickstaley commented 4 years ago

Released in 0.8.1.