retorquere / zotero-better-bibtex

Make Zotero effective for us LaTeX holdouts
https://retorque.re/zotero-better-bibtex/
MIT License
5.28k stars 284 forks source link

Unicode replacement results in undefined mathsl command #2722

Closed warwickmm closed 8 months ago

warwickmm commented 10 months ago

Debug log ID

7W56UACL-refs-euc/6.7.140-7

What happened?

I was experimenting with the "Export unicode as plain-text latex commands option" using BibLaTeX.
image

I created a new entry in Zotero with the title "The π‘˜ is in unicode". I then exported this item using Better BibLaTeX, which resulted in

@article{Unicode,
  title = {The {$\mathsl{k}$} Is in Unicode}
}

However, it doesn't seem like mathsl is defined in any of the standard math packages (I'm using amsmath, amssymb) so compiling a document results in

! Undefined control sequence.
<recently read> \mathsl

It seems like mathsl is defined in the sfmath.sty package, but this forces all math symbols in the document to be sans serif.

Should the above be $k$ instead of $\mathsl{k}$? Or, did I simply choose the wrong unicode symbol (U+1D458) for what should be $k$?

warwickmm commented 10 months ago

As an aside, I was testing "Export unicode as plain-text latex commands option" as an alternative to using the #LaTeX tag. I noticed that the latter would result in sentence-cased titles (as I've defined in Zotero), whereas items without the tag would be title-cased. This resulted in inconsistently cased bibliography entries. Based on what I've read, I believe that is the expected behavior so I was trying to find a way to make the resulting casing more consistent.

warwickmm commented 10 months ago

Similarly, using an β„“ results in {$\mathscr{l}$}, which for me is also undefined. In addition, according to this list of symbols (page 119), the packages that define the mathscr command (mathrsfs, euscript, etc.) only support capital letters.

retorquere commented 10 months ago

I'm on it, in understand the problem.

retorquere commented 10 months ago

Do you know of a supported form for β„“? $l$ isn't really it.

retorquere commented 10 months ago

Page 119 seems to indicate it would support capitals? It gives \mathscr{ABC} as a sample. Isn't the limitation just on urwchancal?

warwickmm commented 10 months ago

β„“ should be $\ell$.

retorquere commented 9 months ago

I've finally gotten around to make the requisite changes. It took a major overhaul of the tex conversion system but it's a lot better for it. Default translation is now always without required packages, and you can specify which packages you have loaded to get better conversions.

Unfortunately it took so long that 7W56UACL-refs-euc/6.7.140-7 has since expired; can you send me a new one so I can add it as a testcase (to prevent accidental regressions).

warwickmm commented 9 months ago

Thank you. Hopefully M29XBJZV-refs-euc/6.7.143-7 contains some useful information.

retorquere commented 9 months ago

That now exports to

@article{lastAreUnicode,
  title = {The {$\ell$}, {$k$}, and {$P$} Are in Unicode},
  author = {Last, First},
  journaltitle = {The Publication}
}

by default.

retorquere commented 9 months ago

BTW it's a bit of a slog, but should you be curious, the mappings are here, if you spot anything else out of the ordinary, let me know and I'll update them.

warwickmm commented 9 months ago

For the title "The β„“, π‘˜, and 𝜌 are in unicode", I would have expected the export to contain some form of the greek letter rho (e.g., $\rho$) instead of $P$. Perhaps I messed up the test case in the debug log?

github-actions[bot] commented 9 months ago

:robot: this is your friendly neighborhood build bot announcing test build 6.7.143.2722.5585 ("Merge branch 'master' into gh-2722")

Install in Zotero by downloading test build 6.7.143.2722.5585, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

retorquere commented 9 months ago

No, unicode and compositions are just complex, and it turned out I had a lot of cleanup work to do on my mapping tables. They're better for it though. Current output is

@article{lastAreUnicode,
  title = {The {$\ell$}, {$k$}, and {$\rho$} Are in Unicode},
  author = {Last, First},
  journaltitle = {The Publication}
}
github-actions[bot] commented 9 months ago

:robot: this is your friendly neighborhood build bot announcing test build 6.7.143.2722.5589 ("Merge branch 'master' into gh-2722")

Install in Zotero by downloading test build 6.7.143.2722.5589, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

warwickmm commented 9 months ago

Thanks! If it helps, I can take a look at my library and come up with some more interesting test cases, along with expected output. Selfishly, it would be nice to know that the symbols I (currently) care about are mapped correctly.

warwickmm commented 9 months ago

Looks like my library entries have enough cases where I can't find good unicode representations, so I may have to stick with the postscript approach. However, below are some more examples in case you find it useful for your tests (apologies if I messed up some of the manual translations):

unicode: "Matrices with Small Coherence Using 𝑝-Ary Block Codes",
latex: "Matrices with Small Coherence Using {$p$}-Ary Block Codes"

unicode: "Penalty Functions and Duality in Stochastic Programming via πœ™-Divergence Functionals",
latex: "Penalty Functions and Duality in Stochastic Programming via {$\phi$}-Divergence Functionals"

unicode: "On Recovery of Sparse Signals via ℓ₁ Minimization",
latex: "On Recovery of Sparse Signals via {$\ell _1$} Minimization"

unicode: "Restricted Isometry Constants Where β„“α΅– Sparse Recovery Can Fail for 𝟒 < p ≀ 𝟣",
latex: "Restricted Isometry Constants Where {$\ell^p$} Sparse Recovery Can Fail for {$0 < p \leq 1$}"

unicode: "Optimally Sparse Representation in General (Nonorthogonal) Dictionaries via β„“ΒΉ Minimization",
latex: "Optimally Sparse Representation in General (Nonorthogonal) Dictionaries via {$\ell^1$} Minimization"

unicode: "State-Space Solutions to Standard β„‹β‚‚ and β„‹ ͚ Control Problems",
latex: "State-Space Solutions to Standard {$\mathcal{H}_2$} and {$\mathcal{H}_\infty$} Control Problems"

unicode: "Relative-Error πΆπ‘ˆπ‘… Matrix Decompositions",
latex: "Relative-Error {$CUR$} Matrix Decompositions"

unicode: "The {{Gelfand}} Widths of β„“β‚š-Balls for 𝟒 < p ≀ 𝟣",
latex: "The {{Gelfand}} Widths of {$\ell_p$}-Balls for {$0 < p \leq 1$}"
retorquere commented 9 months ago

Keep 'em coming, I am taking this as an opportunity to get these mappings tuned better (although it would help me if they came in the form of a debug log ID).

But in the case of

Matrices with Small Coherence Using 𝑝-Ary Block Codes

What would you have expected instead of Matrices with Small Coherence Using {$p$}-Ary Block Codes? 𝑝 is "Mathematical Italic Small P". Can you say what you would have expected instead of these results?

warwickmm commented 9 months ago

That looks correct to me. Sorry for the confusion, the above are simply examples with expected outputs. I didn't get a chance to test the new build, so it's possible that everything is already working as expected.

retorquere commented 9 months ago

No, I found a few missing in the list above, working on those.

github-actions[bot] commented 9 months ago

:robot: this is your friendly neighborhood build bot announcing test build 6.7.143.2722.5613 ("name split")

Install in Zotero by downloading test build 6.7.143.2722.5613, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

retorquere commented 9 months ago

5613 should fix things. Would appreciate verifiction.

warwickmm commented 9 months ago

That's looking much better to me. In fact, inspecting the output revealed some mistakes in my unicode examples above.

One minor curiosity is at the end of

"Restricted Isometry Constants Where β„“α΅– Sparse Recovery Can Fail for 𝟒 < p ≀ 𝟣"

the 𝟒 and 𝟣 get mapped to $\mathsf{0}$ and $\mathsf{1}$, respectively. Perhaps I chose the wrong unicode numerals in my example, but are there any unicode numerals that would result in simply $0$ and $1$? I think the latter could allow the user more control over how the math fonts appear in their document.

retorquere commented 9 months ago

𝟒 is Mathematical Sans-Serif Digit Zero, 𝟣 is Mathematical Sans-Serif Digit One, so that seems a pretty close match to mathsf. I don't currently have a mapping for number characters that would switch to math mode without TeX makeup, as I don't really know what input would sensibly trigger it (I don't think it'd be wise to say numbers should always be in math mode). There's candidates here, the closest would seem to be Mathematical Monospace Digit Zero, but that currently maps to $\mathtt{0}$. Maybe one of those is uncommon enough that it could be just $0$.

For a bit of background, Zotero doesn't do math, just unicode, and the primary reason I export math contexts is that there are a lot of unicode characters that only have a math-mode LaTeX equivalent. BBT doesn't know 𝟒 < p ≀ 𝟣 is math, it just known it can only output that in a math context. What my converter does is try to convert as much as it can to text-mode, until it hits a character that demands math-mode, then it stays in math-mode as long as it can, etc.

I've experimented with the idea of making spaces and numbers non-switching, but that would leave the leading 0 in your sample still in text mode, and it would mess with spacing because it would grab spaces at the end of formulas into math-mode etc.

If you want full control, the only way currently is dropping into raw-latex mode with something like

Restricted Isometry Constants Where β„“α΅– Sparse Recovery Can Fail for <script>$0 < p \le 1$</script>

or

Restricted Isometry Constants Where β„“α΅– Sparse Recovery Can Fail for $0 < p \le 1$

using this, but either way that's going to mess up bibliographies you make with Word, should you also use that.

I'm still hoping Zotero will just add mathml support but I reckon citeproc would have to see a fair bit of work to make that work, rendering it in Word might be non-trivial, so I don't expect that to come really soon (if at all).

warwickmm commented 9 months ago

That makes sense, it sounds like a difficult problem. I think the solution you have is quite powerful given the constraints.

retorquere commented 9 months ago

Thanks! Also for sticking this out, I wanted to improve the mappings for a long time and this holiday I finally got around to it.

You can force numbers to math btw should you want it: https://retorque.re/zotero-better-bibtex/installation/preferences/hidden-preferences/#mapmath

warwickmm commented 9 months ago

This plugin is indispensable for latex users, so thank you for all of your hard work.

As an aside, should the description for mapMath say:

Any characters entered here will prefer a math-mode LaTeX-command counterpart over a math-mode text-mode, if a math-mode command is available.

retorquere commented 9 months ago

That is correct, thanks, the site is rebuilding.

retorquere commented 9 months ago

Well now... the math-mode number has a clean solution if I get a little coop from the Zotero devs...

Zotero supports restricted HTML markup in titles, so you can get italics for example using Title with <i>italics</i>, and making parts excempt of case-changing using Title with a <span class="nocase">Noun</span>... those will affect output but will not be output themselves. If I can get them to just blindly ignore any <span>s that have a class they don't recognize, I could easily make Restricted Isometry Constants Where β„“α΅– Sparse Recovery Can Fail for <span class="math">0 < p < 1</span> do what you want. Right now those spans turn up in Word. If that changes, we'd be golden.

github-actions[bot] commented 9 months ago

:robot: this is your friendly neighborhood build bot announcing test build 6.7.143.2722.5629 ("new texifier")

Install in Zotero by downloading test build 6.7.143.2722.5629, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".