slatex / sTeX

A semantic Extension of TeX/LaTeX
50 stars 9 forks source link

verbalizations need a URI #405

Open kohlhase opened 1 year ago

kohlhase commented 1 year ago

I would like to have a MMT API that can give me all the verbalizations of a given symbols (in all languages). This is now possible via SPARQL, iff verbalizations have canonical URIs. So we need them. Currently a verbalization is generated by a \definiendum(and \definame), e.g. like so

\begin{smodule}[lang=en]{bar}
\symdecl*{foo}
...
\begin{sdefinition}
  ... a \definiendum{foo}{first-order object} or simply \definiendum{foo}{FOO},
\end{sdefinition}

The \symdecl* declares a symbol bar?foo and the \definiendums give two (english) verbalizations for bar?foo. One way to allow URIs is to re-activate the old idea that we give URIs for components (like notations, definientia, declared types) via a third ?in URIs. Then we could just give \definiendumdeclarations names in sTeX add a [name= optional argument to \definiendum and definame like so:

  ... a \definiendum[name=long]{foo}{first-order object} or simply \definiendum[name=short]{foo}{FOO},

and be done with it. Alternatively, we could just remove the name= and use

  ... a \definiendum[long]{foo}{first-order object} or simply \definiendum[short]{foo}{FOO},

like we do for \notation. Then we would have the URI <g>?bar?foo?long for the first verbalization and <g>?bar?foo?short for the second, where <g> is the URI of the document with the smodule bar.

And while we are at it, we should directly generate the triples for the other components and add the necessary ulo:has_component, and the respective component types and ulo:in_language relations for that, so that the snippet above generates the triples

g?bar?foo ulo:has_component g?bar?foo?short
g?bar?foo ulo:has_component g?bar?foo?long
g?bar?foo?short rdf:type_of ulo:verbalization
g?bar?foo?long ulo:language "en"
g?bar?foo?long rdf:type_of ulo:verbalization
g?bar?foo?long ulo:language "en"

and even "shortcuts" like

g?bar?foo?long ulo:verbalized "first-order object"
g?bar?foo?short ulo:verbalized "FOO"
Jazzpirate commented 1 year ago

One way to allow URIs is to re-activate the old idea that we give URIs for components (like notations, definientia, declared types) via a third ?in URIs

Types and Definientia already have component-URIs :) Extending that to notations and verbalizations would be extremely invasive wrt MMT however. Also, it wouldn't solve the problem: As with notations, verbalizations can occur anywhere, not just in the very module a symbol was declared in. So their URIs can not start with the URI of the symbol itself.

Jazzpirate commented 1 year ago

The closest I can think of that would be feasible would be to treat verbalizations exactly like notations. They basically are notations, just in text mode rather than math mode. The question is to what extent that is what you want.

Let in the following "verbalization"="notation for text mode". I would imagine that just like any \symdecl{foo}[args=2] induces a default notation \comp{foo}(#1,#2) and a default operator notation foo, one could induce a default verbalization \comp{foo} applied to #1 and #2 and a default operator verbalization foo. A \definiendum could be used to introduce a new operator verbalization. A verbalization would then have a URI just like notations do, and just as with notations, it would be some autogenerated URI in whatever-module-the-definiendum-is-in, that is related to the actual symbol via some ULO predicate.

Advantages: verbalizations are naturally parametric. e.g. \verbalization{vectorspace}[over]{\comp{vector space} over #1} we get for free, because it's "just a notation".

I'm questioning whether that is what you want, because it would entail that a verbalization is arbitrary HTML and need to be treated as such. Arguably, it needs to be either way, but I'm not sure you considered that, depending on what you want as query results...

kohlhase commented 1 year ago

This sounds extremely attractive, and the parallelism between notation and verbalization is an added plus, that is how it "should be". I am not sure why I should be afraid of verbalizations being HTML. I would guess that in practice, it would almost always be HTML without tags.

Jazzpirate commented 1 year ago

No, in practice it wouldn never be HTML without tags ;)

kohlhase commented 1 year ago

What tags do you see in the HTML?

kohlhase commented 1 year ago

Can you be concrete?

Jazzpirate commented 1 year ago

At the very least one div with probably class="rustex-hbox", and once arguments are involved, it will be deeply nested divs with some marking arguments and their index, and their argument modes, and...

kohlhase commented 1 year ago

OK, that is to be expected, but that is nothing to be scared of. Most of these tags will be "invisible" though, right?

Jazzpirate commented 1 year ago

it depends on what you mean by invisible...? Do you have a specific use case in mind?

Jazzpirate commented 1 year ago

(Also, note that we're now talking about a major refactoring of core parts of sTeX again - so don't expect that to happen any time soon, especially not before the semester starts :D )

kohlhase commented 1 year ago

of course not before the semester, I am happy if all the things we are planning somehow work out in time.