Open kzvi opened 5 years ago
Interesting proposal... any idea what the font support for those characters is on Windows and Linux? (I can see them fine on macOS)....
btw. probably this issue is part of https://github.com/jgm/texmath
They display in the browser and terminal just fine for me on Linux. This sounds like a good idea to preserve more data between formats. Unless ASCII output is desired using proper Unicode representations that are there for exactly these meanings seems like the right thing to do.
Texmath has Text.TexMath.Writers.Pandoc, which pandoc uses for the default math translations. This has
renderStr :: TextType -> String -> Inline
renderStr tt s =
case tt of
TextNormal -> Str s
TextBold -> Strong [Str s]
TextItalic -> Emph [Str s]
TextMonospace -> Code nullAttr s
TextSansSerif -> Str s
TextDoubleStruck -> Str $ toUnicode tt s
TextScript -> Str $ toUnicode tt s
TextFraktur -> Str $ toUnicode tt s
TextBoldItalic -> Strong [Emph [Str s]]
TextSansSerifBold -> Strong [Str s]
TextBoldScript -> Strong [Str $ toUnicode tt s]
TextBoldFraktur -> Strong [Str $ toUnicode tt s]
TextSansSerifItalic -> Emph [Str s]
TextSansSerifBoldItalic -> Strong [Emph [Str s]]
So as you can see a toUnicode
transformation is done for e.g. fraktur, but for boldface, the pandoc Strong
element is used.
If we changed this, it would change not just plain
output but all formats, e.g. in HTML you'd get unicode boldface characters instead of a <strong>
tag. That might not be so bad, if font support is consistent enough.
Other alternative would be to change the plain
writer's handling of Strong, so that it uses unicode boldface when possible, instead of capitalizing. This would affect not just math but everything.
Third possibility would be to change handling of Strong in plain
writer, but only in math contexts.
I think that there is a sense in which translating bold math variables into <strong>
tags is semantically incorrect, since <strong>
is designed to mean "important" whereas <b>
is designed to mean "bold".
Yes, that's right technically. But as a practical matter, the point of this module is to convert math into something that will render in EVERY format pandoc supports. Pandoc's data model has a Strong constructor, so by using that we can get decent results in every format.
But we could change things in any of the three ways outlined above.
Let’s step back a moment to see the big picture. Task: Convert math notation to another format. There are three scenarios, depending on the type of format. In order of increasing “lossiness”:
$\lambda \mathbf x$
<math><mi>λ</mi><mi mathvariant="bold">x</mi></math>
λ<strong>x</strong>
λ**x**
λ𝐱
Each type of format also supports the capabilities below it.
Some generic markup formats support embedding specific markup via plugins, but that capability is not really part of the format itself. Example: HTML or Markdown + MathJax: $\lambda \mathbf x$
.
With that in mind, let’s evaluate your three options.
Your third option is best. It would require a new function akin to renderStr
that emits Unicode but does not try to emit any markup like Strong
or Code
, since they are unrepresentable in plain text.
Your first option is okay. I could imagine some users wanting the choice (a command-line switch?) to “abuse” semantic tags like <strong>
. Fonts with Unicode math support are now widespread, but it depends on the document’s audience (mobile, for author’s use only, etc.). Either default seems fine.
Your second option, if I understand correctly, proposes to make all strongly emphasized text appear 𝐛𝐨𝐥𝐝 by abusing Unicode math characters. I reject this outright as even more semantically untenable than abusing HTML tags. This is the job of “fancy text generators” for teens’ tweets, not Pandoc.
However, the main culprit of this issue and many others (#3518 #3766 etc.) is the actively confusing status quo that plain
output uses Project Gutenberg conventions (including converting emphasis to uppercase). This is not what most people nowadays assume “plain text” means – it should act like the mere loss of formatting caused by pasting formatted (“rich”) text into a basic textbox.
Two years ago I started writing a proposal to revamp plain
output, but didn’t finish. If you want to take up my dormant case, here are my notes. As @jgm said, it would be best to start a mailing list thread about this proposal. Sorry for straying off topic and not putting effort into my own crusade.
In summary, to quote your second option very selectively, I would advocate to drastically “change the plain
writer’s handling of Strong… instead of capitalizing” – but in a separate issue, and not quite in the way you suggested.
@hftf I think your format analysis should be amended. Exact hits on relevant Unicode is not a last resort and should take priority even over your no.2 approximation formats.
@hftf to be clear, all three of my options are only meant to affect rendering of math. It won't affect all strongly emphasized text.
Thank you both for the clarifications.
@alerque Sorry if the meaning of my list was unclear. It shows which formats are “more capable.” For example, type 2 is capable of both generic markup and Unicode/plain text, but 3 is only capable of Unicode/plain text. A function intended for type 2 is of little use for 3 since it lacks that capability. In terms of priority, however, I can agree with your list.
@jgm I must be confused since your comment said “This would affect not just math but everything.“
I must be confused since your comment said “This would affect not just math but everything.“
No, I was the one who was confused. You are right. Yes, I agree, better to stay away from that option!
To summarize the two options that remain:
Option 1: Change Text.TexMath.Writers.Pandoc.renderStr
so that instead of a Strong element, it uses unicode boldface characters. This would change not just plain output but all formats, e.g. in HTML you'd get unicode boldface characters instead of a <strong>
tag. That might not be so bad, if font support is consistent enough.
Option 2: Change handling of Strong in plain writer, but only in math contexts. This would localize the change to plain
.
@hftf you've raised another issue: should we depart from the current "Project Gutenberg" conventions for plain
output. I can see a case for this, and it might be possible, for example, to add a +gutenberg
extension, or perhaps a gutenberg
target, and make plain
plainer. But this shoud be put on the tracker as a separate issue, I think.
Would you like to raise that issue? I’m awfully busy now, but could try to work on it in a few months.
I think this issue also raises some related issues, so I’m not sure how many should be filed.
capitalize
running text? (I exclude fake small caps¹ and normalizing symbols like entity or attribute names².) This type of capitalization is an anomaly in the codebase as it is only used in plain
to mimic two Project Gutenberg conventions (viz. strong emphasis and level-1 headings). I’m not convinced users want this behavior – evidence shows many surprised reactions.____
¹ See writers for CommonMark, Markdown or plain, FB2, and Ms.
² See usage of toUpper
etc.
Current behavior:
Desired behavior:
This would be useful because it makes the output a more accurate / recognizable representation of the formula. Pandoc already does this for
\mathbb
symbols. Unicode has characters specifically for this purpose listed here.