Closed NSoiffer closed 2 years ago
I think I would use <mrow intent="H2O"><msub>...
I think it's reasonable to have an mrow
(or math
) holding the molecule here H2 isn't (as far as I can remember any chemistry) really a thing on its own, it's just at the molecule level it makes sense to have two H and one O
But If you want to reference the children isn't that
<math>
<msub intent="chemistry-sub($element,$count)">
<mi arg="element" mathvariant='normal'>H</mi>
<mn arg="count">2</mn>
</msub>
<mi mathvariant='normal'>O</mi>
</math>
Where chemistry-sub
(probably wih a real chemical name) is in a (chemistry) intent list with a speech hint of "$element $count"
To illustrate why it is nice to have a principled approach that is seeded with a very long list, here is a step-by-step of how I would answer this (using encyclopedic names )
H
stands for hydrogen2
is a numeric literaldihydrogen
and diatomic hydrogen. Those two names could be viable aliases (#257) for the intent expression molecule(hydrogen, 2)
.
H - H
, where I could imagine writing instead molecule(hydrogen, hydrogen)
, which is narrated differently but is otherwise equivalent, to using the literal 2
.O
stands for oxygenMolecular formulae indicate the simple numbers of each type of atom in a molecule, with no information on structure
So, the entire expression can be marked in full as either (the author would know which):
molecular-formula(molecule(hydrogen, 2), oxygen)
empirical-formula(molecule(hydrogen, 2), oxygen)
Alternatively, if we wanted to do the minimal annotation possible to get Neil's desired narration goal, we may just have to annotate the msub
with a single mention of molecule
applied to its raw presentational arguments:
<msub intent="molecule($H,$num)">
<mi arg="H" mathvariant='normal'>H</mi>
<mn arg="num">2</mn>
</msub>
I am amused that Neil reached for a literal narration string here. Have you considered aria-label="H20"
? This also made me wonder if an author wanted to force a "high-level" pronunciation, whether they could use an aria-label="water"
to enforce that preference.
Edit: that said, water is also an encyclopedic name, so a use of <math intent="water"...
ought to also be legitimate.
I think it's reasonable to have an mrow (or math) holding the molecule here H2 isn't (as far as I can remember any chemistry) really a thing on its own, it's just at the molecule level it makes sense to have two H and one O
Certainly H-O-H
doesn't have a separate dihydrogen molecule in it. And yet the way molecular/empirical formulas get written, all instances of the same element get bunched up together, are spoken together, and are often thought of as units (even if only for counting exercises - which are a big part of high-school chemistry).
So rather than changing the intent annotation to make sure the structural information is always accurate, we could accept that there is no structural information in some chemical formulas, and they use the lie-to-children method for abstracting away the structural specifics.
I haven't done chemistry in a long time but I think your enumerated lists shows the danger of being too formal and possibly wrong as opposed to be sufficiently relaxed that you can still be helpful, but can't really be wrong.
Using molecule(H,2) for the H2 part of H2O would be wrong I think. There is only a single molcule involved: H20, as you note the usage here is the total count of each element in the molecule, H20 is "some molecule with 2 H and an O" it is explicitly not "some combination of the molecule H2 with the element O. so I can't see how either molecular-formula(molecule(hydrogen, 2), oxygen)
or empirical-formula(molecule(hydrogen, 2), oxygen)
can be right, but also, we shouldn't need a chemist to explain the subterms needed H_2
happened to be a possible molecule but sugar is C_{12}H_{22}O_{11}
and you should be able to give that an intent without a course in organic chemistry and as far as I know there is no molecule H22 that you could reference as a subterm
You said the same thing in your followup commnent
Certainly H-O-H doesn't have a separate dihydrogen molecule in it.
Although still I think suggesting using molecule() for the subterms anyway for what I called chemical-sub above (not that I recommend that name either, but it seems better than molecule)
<mrow intent="C12H22O11">
<msub C 12 </
<msub> H 22</
<msub>O 11</
</mrow>
Looks fine for that purpose to me, or perhaps intent="chemical-formula(C12H22O11)"
Sure, if the word molecule
is too committal so as to be viscerally "wrong", a simple count(hydrogen, 2)
in the molecular-formula case would do better. I saw the wiki page refer to "number of atoms", but these are not always atoms since you can also subscript parenthetical groups. Some generic counting word would fit better.
The problem with a generic count
is that it also isn't always accurate. For example, in the expression 2 <msub>H 2</msub>
the msub is really the molecule dihydrogen, and you have two of those molecules via the leading baseline 2.
So the remediator would (sometimes) need to actively keep track of the molecular structures hidden in the syntax and use a different annotation between the counting and molecular cases.
I think I disagree with,
you should be able to give that an intent without a course in organic chemistry
since we would also expect people to know the basic language of calculus to annotate integrals, and people to know the basic language of group theory to annotate groups, etc.
While "H_2" is a molecule consisting of two "H" atoms, the "H_2 O" molecule doesn't have an "H_2" molecule within it, even though it does contain two "H" atoms.
To the extent we're leaning semantic, there were two distinct kinds of notations used above. One is "chemical formula", which mainly give the number of each kind of atom in the compound, but also can preserve structural subgroups, So it would (I think) follow a grammar something like:
formula = atom | "chemical-formula" "(" (formula, count)+ ")"
that is, chemical-formula(H,2,O,1)
(for water), but also chemical-formula(Al, 2, chemical-formula(S,1,O,4),3)
(aluminum sulfate, see wikipedia). And, probably should include some way of expressing ionizations, isotopes, etc.
And then there are "structural formula", like the "H-O-H". These are more diagram-like and quickly become 2D (or 3D), although this example is easily written linearly. To the extent they're writable in MathML, they seem easy to account for in intent: they'd look like mi
(possibly embellished with ionization, isotopes, etc) annotated with the element name, along with mo
operators for the bonds. So, we'd need names for the kinds of chemical bonds (single, double, etc).
Presumably an AT system that understands these symbols would pronounce the formula for water as "aitch two oh" (at least, given the right preferences settings). How structural formula get pronounced, however, I have no idea. We really need a chemist involved to clarify the minimal, necessary set of concepts & structures along with expected pronunciations.
OTOH, if we don't want to define these concepts and rather just push the above pronunciation, then either an aria approach was suggested above, or else we'd need our own equivalent just-say-this("h 2 o")
kind of thing.
in practice the input may be a latex document using mhchem
package say, and an input like this (from that package manual)
we should be able to make some intent without knowing much more than "the subscript combinations contained here were generated by \ce{(NH4)2S}
so intent="chemical-formula('(NH4)2S'})
would have been nice but we don't have string literals so maybe intent="chemical-formula(NH4,2,S)
would work out fine. We really should not have to consider the chemistry here.
@davidcarlisle LaTeX macros really ought not guide us too far... How about grabbing something from page 12 of the mhchem documentation:
\ce{Zn^2+
<=>[+ 2OH-][+ 2H+]
$\underset{\text{amphoteres Hydroxid}}{\ce{Zn(OH)2 v}}$
<=>[+ 2OH-][+ 2H+]
$\underset{\text{Hydroxozikat}}{\ce{[Zn(OH)4]^2-}}$
}
should that be intent="chemical-formula('Zn^2+<=>[+ 2OH-][+ 2H+]$\underset{\text{amphoteres Hydroxid}}\ce{Zn(OH)2 v}}$<=>[+ 2OH-][+ 2H+]$\underset{\text{Hydroxozikat}}{\ce{[Zn(OH)4]^2-}}$')"
?
To the extent that "openparen en aitsch four closeparen two ess" is all you'll ever want to speak, then "(NH4)2S" is certainly a nice, compact, typeable expression of the compound. If you ever need to go beyond that, I wonder how much complex parsing we would be off-loading to the AT (I haven't yet got a LaTeXML binding for mhchem, 'cause it's mind-bogglingly complicated)
I think you are all overthinking this and trying to mirror the chemistry, which indeed does has more of a H-O-H
geometry. But the point of intent
is not to mirror the semantics, but mirror the speech. $\ce{H_2O}$ and H-O-H
would be spoken very differently. I want to be able to say H 2 O
, not H single bond O single bond H
. Giving it a name either makes it part of level one where the name gets interpreted or it will be spoken, something that shouldn't happen here.
Furthermore, it seems very wrong to use a literal for it as that means the subparts for a more complex molecule such as the one @davidcarlisle listed ($\ce{(NH4)2S}$) end up having to be replicated.
Since it appears I'm not missing an easy way to handle this, I think this points to a shortcoming of the intent
syntax as it currently stands.
giving it a name either makes it part of level one
Exactly, in my opinion we should have a dedicated name, part of Intent Core (level 1), for each operation that needs treatment for AT.
I don't understand the point about "shortcomings" of the current syntax. We know the list of values hasn't been completed.
I'm confused about one of @NSoiffer points: "H20" and "H-O-H" are two very different things; I don't know how the last one should be pronounced, but almost certainly not the same as the first. I'm also confused about what is missing in the intent *syntax", in that you seem to rule out names for the concepts(?).
Continuing on from the discussion above and at the meeting...
Definitely: level 1 === core -- ran out of time to reread and clean up what I wrote above.
Having something like molecular-formula(molecule(hydrogen, 2), oxygen)
in "open" is fine if you are ok hearing "molecular formula of open paren open paren molecule of hydrogen comma 2 close paren comma oxygen close paren" which would be the likely default reading of something in "open". MathCAT could implement it so that it reads nicely, but that wouldn't be satisfactory to someone generating it because the rest of the screen readers would likely not implement it because it is not in core.
On the call, we discussed moving molecule
(or rather some set of names that we come up with) into core to support chemistry. I think that is something we should do. Hopefully we'll get a good set of names from the Chem CG. But I think this inability to use argrefs and not have it pronounced in a functional notation is a problem that needs to be addressed.
One the call, @brucemiller suggested an addition to core of a silent
(name TBD) core name which would just pronounce the arguments. For example:
<math>
<msub intent="silent($element,$count)">
<mi arg="element" mathvariant='normal' intent="cap-h">H</mi>
<mn arg="count" intent="2">2</mn>
</msub>
<mi mathvariant='normal' intent="cap-o">O</mi>
</math>
would turn into cap h 2 cap o
. I've used intent
on the chldren of the msub
just to emphasize that the default speech is being overwritten.
That's one solution to the problem and not a bad one. My only concern is their might be other special cases. Of course those could be added to core as we come across them, but it seems a bit special-case.
Another solution that avoids adding a new (special case) name to core is to expand the syntax some (which is adding a new special case syntax ;-). E.g., if we allow $argref-$argref
, then we have an alternative to a functional notation speech form for open. "-" is used to mirror what is done with "-" and function names (i.,e., say everything but replace "-" with space). This is somewhat related to w3c/mathml#256 which suggests a way of having a prefix, infix, and postfix speech rendition. In the above, we would have indent = "$element-$count"
.
As mentioned on the call, a handful of new attributes can probably handle this:
element, molecule, isotope, ...
I don't see why we are considering cumbersome non-semantic alternatives.
@NSoiffer if I was writing this expression for the Open coverage, I would write it differently. I expect basic chemical notation to be in Core, also because you have repeatedly expressed a desire to have it in Core. So I wrote my example expecting special AT treatment. The way power(H,2)
would be mapped to a specialized "cap-H squared", so could molecule(H,2)
get mapped to a specialized "cap-H two".
By auto-assigning my examples to Open, you're only antagonizing me and straw-manning the current intent syntax, which is quite unproductive.
Also, this statement:
Having something like molecular-formula(molecule(hydrogen, 2), oxygen) in "open" is fine if you are ok hearing "molecular formula of open paren open paren molecule of hydrogen comma 2 close paren comma oxygen close paren" which would be the likely default reading of something in "open".
is false for Intent Open, as proposed and discussed in the last couple of years. Intent Open is still Intent - it is designed for use with the intent grammar, it just has unknown name
values.
A pass of AT over Open should produce a basic function application readout (as we've discussed on many previous occasions). The narration you've written above with the parens spelled out would instead be what you expect to hear for aria-label="molecular-formula(molecule(hydrogen, 2), oxygen)"
, which no one is proposing.
If you told me we will only ever have chemistry in the Open level, and if we expected no AT would ever add support for it, I would annotate the expression as:
<math intent="molecule(times($count,$H),$O)">
<msub>
<mi arg="H" mathvariant='normal'>H</mi>
<mn arg="count">2</mn>
</msub>
<mi arg="O" mathvariant='normal'>O</mi>
</math>
and I would expect every AT to produce a standard functional narration, on the lines of (but not necessarily exactly identical to):
molecule of two times cap-H and cap-O
where molecule
can be a completely unknown value to AT. The value times
should still be known to Core if we wanted to hear it infix.
I agree with @davidfarmer that having a special keyword for silent
, as well as other non-semantic primitives for building narration strings, was never in the original scope of Intent. I am interested in having a discussion for adding more narration-building tooling, but I think we can sandbox that by introducing a new attribute specifically dedicated to concrete narration. An intent-label
or intent-text
or even just the good old alttext
could be good vessels for such a discussion.
Closing because we agreed chemistry notations should go into core. I have opened w3c/mathml#398 to discuss what I think is the main issue that the chemistry notation example raises.
I feel like I'm missing something obvious, but I don't see how to write the
intent
values for water:Suppose I want to send to the speech engine/AT the string
H 2 0
. The only solution I see is to "cheat" and directly put the values of the children intointent
onmsub
asThere doesn't seem to be a way to generically reference the children of the
msub
and end up withH 2
. Am I missing something?