Unify format of core and open concept lists

davidcarlisle commented 3 months ago

As agreed on the call of 2024-06-06 I took an action item to update the format used for the open list to be closer to that of the core list, for easier comparison, and to make it easier to move symbols from open to core if widely accepted.

This done as a PR from my fork rather than from a branch so that the HTML renderings can be compared

OLD

https://w3c.github.io/mathml-docs/intent-open-concepts

NEW

https://davidcarlisle.github.io/mathml-docs/intent-open-concepts

All data in the current yaml is preserved, the form field renamed to property, and new en field for a (English) speech template added.

The notation field with approximate text indication of mathml is still there but not currently displayed although it was used to inform the initial speech templates and in some cases added as real mathml in the comments field eg

https://davidcarlisle.github.io/mathml-docs/intent-open-concepts/#amalgamated-product3indexed%20infix

There will need to be review of almost all the speech templates which were mostly auto-generated from the concept name and arity, however that can be done on the main site not part of this PR which is just really asking about the format.

One specific question abut the table format. In the core list we split it by sections

eg calculus is

https://w3c.github.io/mathml-docs/intent-core-concepts/#calculus

The open list has a single table but has a column for subject

Should we split up the open list with sections for subject (replacing the subject column)??

Pro: makes it more like core, makes the table less wide, groups related symbols

Con: makes it a bit harder to check when adding a new symbol for name clashes as you need to check all the sections, perhaps makes it a bit harder to add new concepts if it means generating a new section (technically that is not hard, but it may be an extra mental barrier discouraging submission)

If we decide to keep a single list with a subject column for open should we revert to that format for core or does it not matter if the tables are diferent?

In addition to Subject, this open list has additional columns for Sources (a list of URL) and Aliases (alternative suggested names), and the yaml has an undisplayed "notation" field that can be used to help author comments and/or the speech templates.

I don't want a long term branch collecting merge conflicts so if this is thought to be more or less in the right direction, I suggest we merge and then edit the individual entries at w3c, but obviously if there are other suggested structural changes (in particular if we decide to split by section on subject area) then we can do that first in this fork...

I pinged a few people as reviewers who commented on the call, but comments welcome from anyone:-)

dginev commented 3 months ago

I'm passing my reviewer flag to Moritz @physikerwelt , who expressed interest in helping to curate the collection during the June 6 meeting (as to improve its reuse in the Wiki community).

physikerwelt commented 3 months ago

It's quite a long change and an improvement over the old version. So, I am in favor of merging. My comments: 1) We should keep one alphabetically ordered list. Groups can be created by attributes. 2) I would prefer if the notation field did contain renderable MathML (I would skip it for now) 3) The alias field could made into a YAML list instead of a comma / br tags separated list

davidcarlisle commented 3 months ago

The alias field could made into a YAML list instead of a comma / br tags separated list

yes I had same thought earlier today when I just noticed it, I agree we should do that (could be done before or after a merge)

davidcarlisle commented 3 months ago

@physikerwelt "The alias field could made into a YAML list instead of a comma / br tags separated list" done

physikerwelt commented 3 months ago

@physikerwelt "The alias field could made into a YAML list instead of a comma / br tags separated list" done

Thank you. Looks good.

davidcarlisle commented 3 months ago

@physikerwelt

The mandatory arguments in the style of this concept name $1 $2 $3 make translation more manual. If we didn't have the arguments, we could use the translations implicitly provided by the Wikipedia link.

It's true that having the argument markers there probably means you need a custom translation but it seems they are the main point of the entire list. As the speech templates (with slots maked by $1, $2...) to say how to voice a use of that intent with that many functions. In simple cases you can just translate the intent name and the argument slots can be implied by some standard rules, but in those cases you don't need an entry at all.

I think it will be clearer once the speech templates have been better reviewed. I made them this concept name $1 $2 $3 as a mechanical process to get something for all arity 3 functions but in many cases that will not be the intended form which should probably be something like this concept name of the $3rd kind of $1 comma $2 but maybe we have a small enough list of template styes that the json can flag what style it is and then given a list such as you show with just the tarnslation fo the concept name, it is possible to generate the whole phrase.

I agree though that if there are more than a few translations having them all in one file might prove unmanagable but changing that later wouldn't be hard. Once the data is in a more regular form than it is now it should just be a matter of parsing the yaml and writing it out to a differently organised tree. Currently the data quality is a bit variable with some things split by <br> that should probably be yaml arrays, and the notation field just being a vague hint of intended notation. So I'd rather keep it in one file for now, and also keep the core and open list formats more or less the same, just with open having more cfields.

physikerwelt commented 3 months ago

@davidcarlisle I fully agree. The approach is compatible with translate-wiki (I think you can also change the English version there). So I think we don't need to have perfect English versions. People who do manual translations can look at Wikipedia pages or other resources in their respective languages. For now, I would exclude n-ary concepts, as they do not match the translate-wiki idea. Moreover, it is more of a philosophical discussion if there are n-ary expressions. In MathML content, where + is n-ary, others argue it is a binary operator. For the bell polynomial example, we have discussed before, the number of arguments is probably 3. Also, when you think of TeX, I guess the number of arguments is limited to 9. That's why people will realize in the future that it does not make sense to define n-ary for speech templates. Consequently, I would rather avoid to treat that special case.

davidcarlisle commented 3 months ago

The WG meeting 2024-06-13 resolved to merge this as is, allowing general edits to the document with the updated format, the existing fork will still be used for experiments with rendering intents and/or a compressed list of bigop/n-ary symbols,

w3c / mathml-docs

Unify format of core and open concept lists #65