mbakeranalecta / sam

Semantic Authoring Markdown
Other
79 stars 8 forks source link

Adding annotation locally when annotation lookup in effect #139

Closed mbakeranalecta closed 6 years ago

mbakeranalecta commented 7 years ago

There may be cases where you want to use annotation lookup for most cases of a phrase, but sometimes want to add an additional annotation in certain instances. For instance, suppose you wanted to add an index annotation to certain instances of a phrase but not all of them.

If you do

{XML}(index)

That will turn off annotation lookup for that instance of foo, and it will also reset the lookup target for subsequent instances of {XML}.

The current workaround would be to do:

{XML}(language)(index)

for the first instance and then

{XML}(language)

For the next one to set the lookup target back to normal.

This is obviously error prone.

Possible solutions include:

This would mean, look up the full annotation for "XML" and then add the annotation (index) to this instance. This instance would not create a new lookup target for subsequent uses of {XML} so it would be less likely to have unexpected side effects.

Should consider if there is also an argument for allowing:

{XML}(language)+(index)

This would make this instance a lookup target, but the (index) annotation would not be copied. (In other words it would be treated as a strictly local annotation.)

{XML}-(index)

Here the use case is that the annotation lookup source has

{XML}(language)(index)

And you want the local instance to resolve to whatever is in the lookup source, but not (index) if it is defined. In other words,

{XML}-(index)

Would resolve to:

{XML}(language)
rlhamilton commented 7 years ago

Interesting; this is where I was headed in some email to you, but my thoughts were not nearly as clean or simple:-). A couple of questions: would -(index) replace the notion of (noindex)? Also, I presume the last case, {XML}-(index), would be an exception for a single instance and, as with +(index), would not create a new lookup target. Is that correct?

mbakeranalecta commented 7 years ago

-(index) would cancel a previous annotation of (index) during annotation lookup. The semantics would be similar to (noindex) but it would work in different circumstances. -(index) would apply when indexing by default was caused by annotation lookup. (noindex) would apply in cases where indexing was generated automatically for other reason, such a decision to index all phrases that were annotated as languages.

Of course, this is all application layer stuff. The only part of this that is part of the definition of SAM itself is the meaning of + and - in annotations.

According to SAM rules (as opposed to the application layer) the effect of -(index) would be to remove the (index) annotation that would otherwise be applied via annotation lookup. It would apply to the individual instance and would not affect annotation lookup for subsequent phrases with the same text. Similarly +(index) would add (index) to the individual case without affecting annotation lookup for subsequent cases.

mbakeranalecta commented 7 years ago

Presumably the same syntax should apply to citations as well as they can be chained with annotations.

mbakeranalecta commented 7 years ago

Another issue is whether this syntax applies to annotation individually or collectively. If collectively, then the + can only occur between the phrase and the first annotation and applies to all annotations in the chain. If individually, it can be applied to any annotation, meaning you can do this:

{XML}(language)+(index) 

This would not provoke an annotation lookup, because (language) is a regular annotation, but the next time {XML} occurred without annotation, it would look up to here and take (language) but not (index).

Making them local might actually be easier to implement, since we don't have to look back up the chain for chained attributes.

Implementation seems to simply consist of maintaining a separate "local" attribute list which is not consulted for annotation lookup but which is output along with regular attributes on serialization.

mbakeranalecta commented 7 years ago

Another issue to consider: Currently, an attribute annotation such as an ID does not get copied by annotation lookup, however, the presence of one prevents annotation lookup from happening.

Thus

Testing {XML}(language)(meta)(*foo) testing {XML}(*bar) testing {XML}.

Yields

<p>Testing <phrase id="foo"><annotation type="language"><annotation 
type="meta">XML</annotation></annotation></phrase> testing <phrase id="bar">XML</phrase> 
testing <phrase><annotation type="language"><annotation type="meta">XML</annotation>
</annotation></phrase>.</p>

Annotation lookup effectively makes annotations global. Annotate a phrase once and it is annotated everywhere unless it is replaced by a new annotation.

+ makes and annotation local. Local, in this case, means two things:

But the behavior for attributes is different. Attributes are inherently local, but there is a difference in what local means for attributes.

This means, as shown in the example above, that giving the phrase "XLL" the attribute *bar means that annotation lookup does not happen and it does not get the "language" and "meta" attributes.

There seem to be three options:

mbakeranalecta commented 7 years ago

Speaking of attributes being inherently local, this makes perfect sense for id, name, and conditions, but there is a case to be made for allowing the language_tag attribute to be global.

{Bonjour}(!fr)

"Bonjour" will be a French word every time it occurs.

Is there sufficient utility in this to warrant making an exception?

We should note that if someone really wanted this functionality, they could create it for themselves by inventing a markup language that supports something like this:

{Bonjour}(lang "fr")
rlhamilton commented 7 years ago

Regarding your three options, I think the second option seems more consistent. I was surprised to see that adding an ID would drop the other annotations. I wouldn't worry about performance until it becomes an issue. The sam to DocBook conversion seems to be fast enough for me at the moment.

Regarding the messages, I wonder whether it would make sense to suppress the "No annotation found" warnings when you attach an ID or, probably better, make that message a warning that you can turn on or off by setting a debug level.

Regarding the language attachment, I think it's reasonable to make that global.

mbakeranalecta commented 7 years ago

There is another use case to consider. Suppose I want to redefine the annotations on a phrase localy but don't want to trigger annotation lookup. Consider:

{foo}(true)
{foo}+(false)
{foo}

The will resolve to:

{foo}(true)
{foo}(true)(false)
{foo}(true)

But what if you wanted:

{foo}(true)
{foo}(false)
{foo}(true)

Of course, if that is what you want, you can always write:

{foo}(true)
{foo}(false)
{foo}(true)

The use case arises if the author does not know what annotation they are overriding (it is in a different file, or it changes for different builds). We could introduce a new flag for this:

{foo}(true)
{foo}=(false)
{foo}

But if we do this, how to we interpret cases like this:

{foo}(true)
{foo}=(false)(maybe)
{foo}

Actually, the interpretation is logical enough. The presence of a global annotation makes this a annotation lookup target, so = and + have exactly the same consequence in this case:

{foo}(true)
{foo}(false)(maybe)
{foo}(maybe)

What about:

{foo}(true)
{foo}=(false)+(maybe)
{foo}

Would this be interpreted as:

{foo}(true)
{foo}(false)(maybe)
{foo}(true)

or:

{foo}(true)
{foo}(false)(maybe)(true)
{foo}(true)

In other words, which has precedence, = or +?

mbakeranalecta commented 7 years ago

I am inclined not to implement the - form.

The whole argument for these forms is that the writer may not know what the canonical annotation for a phrase is, so they can add to or override the canonical without having to know what it is. But in order to use -, you have to know what the canonical annotation is, in order to know what you want to subtract from it. So if you know, you can override it in one case and restore it in the next:

{foo}(true)(blue)
{foo}-(blue)
{foo}

Is equivalent to:

{foo}(true)(blue)
{foo}(true)
{foo}(true)(blue)

And if you know what the canonical annotation is, you can write that.

And if you really want this functionality, you can implement it in your own language easily enough.

{foo}(true)(blue)
{foo}+(noblue)
{foo}

Which equates to:

{foo}(true)(blue)
{foo}(true)(blue)(noblue)
{foo}(true)(blue)

Then the application layer can deal with noblue cancelling blue.

mbakeranalecta commented 7 years ago

So far the only real-world use case we have thought of for any of this is selective index markers. The '+' format seems to cover that case adequately.

The case for - and = are more about logical combinations than actual use cases.

Perhaps the thing to do is to reserve the - and = forms, so as to avoid any possibility of backward compatibility problems, but not implement until the use cases for each become clear.

mbakeranalecta commented 7 years ago

Need to extend this support to citations, since they can be chained with annotations.

Are citations inherently local like IDs, Names, and Conditions? This is not clear cut. If you are doing:

{Moby Dick}(novel)[Melville 1851]

Then you presumable want the citation copied to every instance of:

{Moby Dick}

But are there cases where you don't want it copied?

mbakeranalecta commented 7 years ago

Just a note on the general principle and why annotation lookup exists in the first place. SAM is designed to encourage subject domain markup and the use of subject domain annotations even within document domain documents. The point of subject domain annotations is that they are always true. They are about the subject, not the document, so they should be always true wherever the subject is mentioned.

There are two general cases where this always true principle breaks down:

So, SAM defaults to the always true position with automatic annotation lookup. But then we need workarounds like those discussed here for the not always true cases.

mbakeranalecta commented 7 years ago

Example of where a citation is a not always true case. Suppose one refers to a concept like "information snacking". The first time you refer to it you may want to site a source:

{information snacking}(concept)[https://www.nngroup.com/articles/information-scent/]

But you may want to reference it many more times in the book, without citing the same source. So you want:

{information snacking}

to be interpreted as:

{information snacking}(concept)

If citations are not inherently local, this would be implemented as:

{information snacking}(concept)+[https://www.nngroup.com/articles/information-scent/]

If they are, it would be

{information snacking}(concept)[https://www.nngroup.com/articles/information-scent/]

But then

{Moby Dick}(novel)[Melville 1851]

Would not be carried forward.

rlhamilton commented 7 years ago

Agreed. In fact, I was about to send you a message saying that you should probably take a step back and see if this is getting too complex, so I think we’re on the same page:-).

Richard

XML Press XML for Technical Communicators http://xmlpress.net hamilton@xmlpress.net

On May 25, 2017, at 05:17, Mark Baker notifications@github.com wrote:

So far the only real-world use case we have thought of for any of this is selective index markers. The '+' format seems to cover that case adequately.

The case for - and = are more about logical combinations than actual use cases.

Perhaps the thing to do is to reserve the - and = forms, so as to avoid any possibility of backward compatibility problems, but not implement until the use cases for each become clear.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

mbakeranalecta commented 7 years ago

Need to check for local annotations that duplicate global ones. Need to check all part of the annotation for match.

mbakeranalecta commented 6 years ago

Should we have an option to turn annotation lookup off altogether? The current default is "case insensitive" and that seems appropriate as it supports the general case. But there could be cases where the writer does not want annotation lookup at all, which which case being able to do !annotation-lookup: off would be helpful. This should be trivial to implement.

rlhamilton commented 6 years ago

Hi Mark,

That might be a worthwhile option, as long as it’s trivial to implement.

Richard

XML Press XML for Technical Communicators http://xmlpress.net hamilton@xmlpress.net

On Oct 31, 2017, at 12:01, Mark Baker notifications@github.com wrote:

Should we have an option to turn annotation lookup off altogether? The current default is "case insensitive" and that seems appropriate as it supports the general case. But there could be cases where the writer does not want annotation lookup at all, which which case being able to do !annotation-lookup: off would be helpful. This should be trivial to implement.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

mbakeranalecta commented 6 years ago

Implemented !annotation-lookup: off as well as a corresponding on option which is equivalent to case insensitive which is the default. Implemented in ff46bb36c7148368697f8e6c3b0a5cea1bd2ca2a

mbakeranalecta commented 6 years ago

Implemented the elimination of local annotations that duplicate global ones only if they match in type, specifically, and namespace.

mbakeranalecta commented 6 years ago

Fully implemented as of 23bd596d5f7d8d4f37cb47623a428029b56e803e