**DRAFT** KaTeX support

wires commented 1 year ago

This is an request for comments on supporting TeX markup to Nostr.

Update 2023/02/03:

Prefixed "katex" tag with "t",.
Use custom regular event instead of metadata event for macros (inline, below).

Wanted to clarify that I intent for this to proposal to remove some unwanted freedoms in implementing TeX rendering and settle on existing conventions used in plaintext e-mail, math wikis and conversation platforms like Zulip.

Regular Nostr clients SHOULD NOT (have to) implement this because of the potential misuse (e.g. due to a bad implementation), such as:

advertisers using "funny" fonts to stand out,
abusing $\mathrm{abc}$ to shadow plaintext abc in some way,
crafting events with content containing $$#[0]$$ etc., in order to exploit the KaTeX implementation in some way

Then again, maybe TeX support is a cool feature and will be used in created ways?

TeX itself is very precisely specified and uses a versioning scheme that converges to pi (and for the font subsystem it depends on (metafont) is converges to e, see https://en.wikipedia.org/wiki/TeX#TeX82). As such, KaTeX + version therefor precisely specify an "archival quality" language for mathematical notation which should render identically on clients and additionally is readable in plain text. There overlap with MathML here but nobody uses that as input language, it is used only embedded in (X/HT)ML documents, but it also has "archival quality" implementation/specification).

End Update

Motivation:

Mathematical notation is generally written using inlined single or double dollar delimited TeX expressions. For example:

Relativistic energy is given by $$E_\mathrm{rel} = \sqrt{(m_0 c^2)^2 + (pc)^2}$$ However using the Lorentz factor $\gamma$ this can be written as > $E={\gamma}mc^2$.

Relativistic energy is given by
$$E_\mathrm{rel} = \sqrt{(m_0 c^2)^2 + (pc)^2}$$
However using the *Lorentz factor* $\gamma$
this can be written as $E={\gamma}mc^2$.

KaTeX is a relatively safe and limited subset of LaTeX, see https://katex.org/docs/supported.html and https://katex.org/docs/security.html.

Below follow two proposals, one KISS and one modeled after NIP-08. Shoot!

Simple KaTeX support

An event with a tag "katex" MUST contain KaTeX code within its .content.

The KaTeX code MUST be delimited by $..$ for inline mode or $$..$$ for display mode.

Supporting clients SHOULD replace KaTeX code (including the delimiters) with an inline or display block containing one of the following.

fallback unicode text when appropriate (for example in terminals).
live KaTeX DOM node
vector/raster image resulting from a (sandboxed) KaTeX render

The tag SHOULD include the KaTeX which was used to author the message, for example ["t", "katex","version","0.16.4"].

Persistent Macros

One or more ["t", "katex","macro", KaTeX_macro] tags MAY be provided, whose KaTeX_macro code MUST be executed before all other TeX expressions and it's \gdef commands MUST be persisted. See https://katex.org/docs/api.html#persistent-macros

Metadata Prelude

Update 2023/02/03:

Change "prelude" to plural "preludes"
Moved "version" into prelude definition
Expanded example for KaTeX macro definition event
[ ] TODO implement/test this and rewrite section

After some sleep, clearly instead of donkey-patching this into the kind:0 messages, it seems better to use a custom regular event (say kind:1001), this would immediately prevent breaking/rewriting older events and does not pollute kind:0 content, which should remain small and relevant.

End update

We can also store a common collections of macros ("prelude") ~~in our metadata~~.

If the ~~metadata~~ katex macro events .content contains a "katex" property of the following form:

{ "content": { "katex-macros": {
    "preludes": {
      "topology": ["0.16.4", "\newcommand{\Top}[2]{\mathcal{T}(#1,#2)}"],
      "bitcoin": ["0.16.2", "\newcommand{\Hash}[1]{\mathrm{SHA256}}"]
    }
} } }

Then one or more ["t", "katex-macro", eventId, name] tags MAY be provided.

The client should then look up the name in the katex macro event.

~~Authors should take caution not to break old events.~~ "katex" tagged events are now immutably referencing their macros so old events won't break and existing macro's can easily be queried from a relay ["REQ","available-katex-macros", {kind:1001}].

Update 2023/02/03: The following section is crazy/overkill. Skip/ignore. End Update

Alternative for KaTeX support (discuss?)

Based on NIP-08 Handling Mentions where mentions are $ or $$ delimited KaTeX blocks.

When an authoring client supports KaTeX input it SHOULD use the delimiters $..$ for inline and $$..$$ for block content.

When the client identifies such delimited KaTeX code, it MUST add the content to .tags with a tag kx, followed by a flag 0 for inline or 1 for block, followed by the KaTeX code. It must then replace the delimited code (including delimiter) by $$[index].

Authoring clients MAY include fallback plain text in the tag after the TeX code, for example ["kx", 0, "\alpha", "α"].

If a viewing client does not support this feature, the unmodified .content becomes less readable if not processed, so the client SHOULD at replace $$[..] with

fallback unicode text when appropriate (for example in terminals).
or, live KaTeX DOM node
or, vector/raster image resulting from a (sandboxed) KaTeX render

wires commented 1 year ago

Hi, I was interested in rendering mathematics in Nostr and started implementing this, but not sure what you think the best way would be to standardize this. Decided to put some of it in writing so that people can shoot comments. I don't have a client to demonstrate this yet. Mostly wondering what you think of this sketch and if a NIP-08 like approach makes sense or not (I think not because it's not backwards compatible and maybe overly complex).

fiatjaf commented 1 year ago

I think there is no way normal social clients will implement this and we shouldn't try. I do think

Relativistic energy is given by
$$E_\mathrm{rel} = \sqrt{(m_0 c^2)^2 + (pc)^2}$$
However using the *Lorentz factor* $\gamma$
this can be written as $E={\gamma}mc^2$.

looks readable enough and you could just do that and expect normal people to not understand, but mathematicians to be reading from their katex-powered clients.

Since this is so niche I don't think it poses any interoperability or feature-creep problems, unlike Markdown.

wires commented 1 year ago

Thanks for the feedback, I hold similar feelings around this feature.

Especially Markdown, it's very complex in the details (sub languages etc). So tried to come up with the stupidest simple thing that still would be valuable.

For KaTeX enabled clients, it would be good to know which events do not require KaTeX substitutions, ideally without parsing the content and hunting for a delimiter.

Hence my suggestion to tag those (and version since to every release support the same commands), wdyt about that?

I also don't think normal social media clients should support this, but if a client wants to support it, this should done in a standard way (and what I wrote down is pretty standard).

Would you recommend I just implement this and put this "NIP" as a reference somewhere? Should multiple clients start implementing it we can always see if it makes sense as a real NIP, I imagine more people are thinking about this even tho it's wuite niche.

fiatjaf commented 1 year ago

I think you could add a tag that is just ["katex"], or perhaps ["t", "katex"] which denotes a "hashtag". Do you think people would be interested in querying relays just for notes with katex?

Or maybe we could standardize a separate hashtag, "math" or something like that, so all math clients would use that even when they're not writing katex (but, for example, when they're replying to katex notes).

wires commented 1 year ago

Updated the issue description to include your feedback and some thoughts/questions.

I feel that if implemented properly (clearly defined unicode fallback, etc) this would still be niche but maybe less niche.

For instance, bitcoin developers communicating about cryptography could just write $xG$ or $\mathbf{F}_q$ or even \F_q (so with a \F macro and without the delimiters) and so on and see that as unicode symbols; 95% of the 'conversational' use of KaTeX would be like this and doesn't need all features. If

there is a well defined KaTeX subset,
and grammar/parser/detector,
and has clear unicode mappings, together with a tag to indicate that it is used in .content Then this should not interfere much with existing client, have even better readability if unprocessed (for the initiated) and potentially more secure.

There are a lot of mathematicians and other scientists with many followers that sometimes just want to write \Sigma, \epsilon, \hbar or stuff like that. Curious to hear if you think this has value, I might have a closer look at how this would work in detail.

wires commented 1 year ago

Do you think people would be interested in querying relays just for notes with katex?

Hmm.

I think in general scientists would be either following the work of a specific person, some institute or research group, or maybe a conference account / instance (via hashtag #blaconf2023). Or they would follow a "topic" (could be done with "hashtags" again). But those would also involve not-katex notes.

One use case I had in mind was publishing "science nostr weekly" PDF that would produce book quality PDF/postscript using LaTeX. But that's a bit niche-of-niche. In that case I would start with katex notes and then crawl related notes. But at this stage this is all imaginary.

That said, wrt noise/signal, using katex is usually a good sign of quality posts for scientists, because someone is speaking their lingua franca TeX. We could also think of this as entirely "special language" science tweets, with a custom UI and support for DOI and citations as stuff. In which case custom event kind is maybe better suited?

But my initial use-case was simple, Nostr about Schnorr signatures and category theory without driving tiring by having to parse TeX by brain, haha.

Thank you for your feedback, much appreciated!

nostr-protocol / nips