`<stamp>` should contain `<desc>`, not the other way round

raffazizzi commented 11 years ago

There are two examples for the <stamp> element: one shows it containing the text that has been stamped, the other shows it containing a description of the stamp. It would be good to be able to distinguish the two cases, since both are equally likely when recording stamps in general. If this element is used for postmarks or postage stamps, for example, you might well want to distinguish text actually forming part of the postmark or appearing on the postage stamp from a general description of the stamp. However, the content model says that its content is just macro.phraseSeq, which doesn't include <desc>, though it does include all sorts of other largely irrelevant nonsense. I propose changing it to (model.phrase|model.gLike|model.descLike)*

Original comment by: lb42

raffazizzi commented 11 years ago

Syd proposes (on council list) it would be better to aim for parity with <figure> and thus to allow a content model to (model.Plike|model.descLike|model.graphicLike)+

This is tidier but clearly breaks all existing examples.

Original comment by: lb42

raffazizzi commented 11 years ago

Here is Syd's email from the list, for the record. pfs

I'm not sure the proposed content model:

( model.phrase | model.gLike | model.descLike)*

makes sense for two reasons. One I think is just a boo-boo, so I'll address it later. But the other boils down to the idea that this change doesn't solve the problem, in the big picture.

Lou correctly points out the main problem -- that it's not entirely clear whether <stamp> is supposed to hold the transcription of a stamp, or a description of it. This (IMHO) is because a) there is a strong desire on the part of many to transcribe what has been stamped using <stamp>, and b) there is an example in 10.3.3 that shows it being used that way despite the description of <stamp> clearly saying it "contains a word or phrase describing a stamp".

I share Paul's instinct that this is very similar to a figure, where one might want to describe it ( < figDesc > ), give a facsimile of it (<graphic>), or transcription of (what is written in) it (<floatingText>) or some combination thereof.

I'm guessing that we can confine transcriptions of stamps to within something a lot smaller than a <floatingText>. So my instinct is to have something like ( model.pLike | model.descLike | model.graphicLike )* as the content of <stamp>, with prose that discuss and examples that show <desc> used to describe it, <ab> used to provide a transcription, and <graphic> used to provide an image.

This would be problematic, of course, because it would potentially invalidate lots of current uses of <stamp>, which often has text content.

Speaking of which, the boo-boo I referred to above is that the proposed content does not include text. Since it does include model.gLike we can surmise that this is just an oversight, though.

Original comment by: pfschaffner

raffazizzi commented 11 years ago

Wasn't there some discussion that we ought to survey TEI-L for an impression of just how many people's usage Syd's suggestion would actually break? I'd be in favour of moving straight to the more rational/consistent (albeit backward-incompatible) solution if we could convince ourselves that's not too disruptive.

Original comment by: gabrielbodard

raffazizzi commented 11 years ago

Lou wrote the following to tei-council two hours.

My proposal is :

add model.descLike to the current content model,
add a comment to the existing discussion pointing out that any material not wrapped in <desc> or other model.descLike element is assumed to be a transcription of text within the stamp itself.
revise the existing examples and text accordingly

[. . .]

I'd like to proceed with this before the next release if no-one has strong objection. My proposal doesn't b reak any existing documents and doesn't preclude a subsequent revision along the lines suggested by Syd.

Original comment by: kshawkin

raffazizzi commented 11 years ago

Gabby's message about was in response to what Lou suggested (which I have quoted in my previous post). I agree with Gabby that it would be nice to fix this all at once if we could.

Original comment by: kshawkin

raffazizzi commented 11 years ago

Well, the problem is that Syd's suggestion would break every existing use of <stamp>, since it implies a change from mixed content to element content. The change I am proposing makes a transition to that more easily.

Original comment by: lb42

raffazizzi commented 11 years ago

I agree that it might be nice to have a new kind of <stamp> which, like <figure> had element-only content. Bit neither you nor Gabby is addressing the point that this would break all current uses of <stamp>. My proposal is a step in the right direction, in that it at least resolves the current ambiguity, without such a blatant flouting of the principle that we should not break existing documents. I would like to take that step for the next release, leaving the issue open if you wish for further discussion.

Original comment by: lb42

raffazizzi commented 11 years ago

I can't speak for Gabby, but my feeling is that we might find that this element is so rarely used that making the big change would be manageable for those affected. I see this as similar to forbiding <gram> as a child of <entry> ( https://sourceforge.net/p/tei/bugs/288/ ): we decided that most people were probably using <gramGrp> already and could handle adding a <gramGrp> wrapper if necessary.

Original comment by: kshawkin

raffazizzi commented 11 years ago

I agree with Kevin that I don't think that lots of people are using this element, but to be honest I have no way of knowing that. A simple poll on TEI-L might give some indication, but is a bad test because people not answering (because they don't read it or don't notice that it might affect them) is not a sign that it won't break their workflows.

I see the choices as:

a) Go for the less backwards-incompatible solution Lou proposes or b) Provide the backwards-incompatible solution, but with warning and with a tool to migrate old document instances. (I think this isn't difficult to script and goes a long way to mollifying arguments against it.)

Because I don't think lots of people are using this, I'm happy with suggestion B, but would be happier if I knew how much problem it might cause.

-James

Original comment by: jamescummings

raffazizzi commented 11 years ago

I am wary of a "tool to migrate old document instances...goes a long way to mollifying arguments against it", because although its technically easy to run the tool once, it may be a huge endeavour to run it against all the texts in a collection and rebuild the system. No, people don't have to, but people do hopefully want to stay true to TEI. So I am in the a) camp.

Original comment by: sebastianrahtz

raffazizzi commented 11 years ago

On 26/05/13 19:47, Kevin Hawkins wrote:

I can't speak for Gabby, but my feeling is that we might find that this element is so rarely used that making the big change would be manageable for those affected. I see this as similar to forbiding || as a child of || ( https://sourceforge.net/p/tei/bugs/288/ ): we decided that most people were probably using || already and could handle adding a || wrapper if necessary.

[feature-requests:#450] http://sourceforge.net/p/tei/feature-requests/450/ || should contain ||, not the other way round

Status: open Created: Thu Apr 18, 2013 06:41 PM UTC by Lou Burnard Last Updated: Sun May 26, 2013 01:23 PM UTC Owner: nobody

There are two examples for the || element: one shows it containing the text that has been stamped, the other shows it containing a description of the stamp. It would be good to be able to distinguish the two cases, since both are equally likely when recording stamps in general. If this element is used for postmarks or postage stamps, for example, you might well want to distinguish text actually forming part of the postmark or appearing on the postage stamp from a general description of the stamp. However, the content model says that its content is just macro.phraseSeq, which doesn't include ||, though it does include all sorts of other largely irrelevant nonsense. I propose changing it to (model.phrase|model.gLike|model.descLike)*

Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/tei/feature-requests/450/

To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

Thinking about this again, it's clear that this element is intended (like <watermark>) to contain a description of the stamp (or watermark) rather than any actual text contained by it. So it's the example showing it as containing text which is misleading. Of course a description might well include words or phrases mentioned in the stamp, which could be tagged with <mentioned>. I've therefore modified the aberrant example in MS, and added a sentence licensing the use of <mentioned> (which is of course already available in the content model)

Original comment by: lb42

raffazizzi commented 11 years ago

That is, of course, true. However, as a principle it would seem a good idea whenever we make a clearly backwards incompatible change, to develop a short script that would rectify it for those who want to. But yes, a) seems to be the much safer option overall. I honestly don't know if this affects 2 projects or 200 and since it is hard to make a decision in an absence of information the other option is, of course, to try to consult the TEI community to get some sense of the scope.

Original comment by: jamescummings

raffazizzi commented 11 years ago

I'm against doing anything backwards-incompatible without substantial consultation with the community, so I don't think we should go ahead with the b) option now. Given that, I think Lou's simpler suggestion is already a substantial improvement without much pain, and I'd vote for it.

Original comment by: martindholmes

raffazizzi commented 11 years ago

As the proposal turned out to be more controversial than anticipated, I think we should stick with the idea that this element is primarily intended to contain a description of the stamp; I have however added the suggestion that if text present in the stamp is included in that description then it should be clearly distinguished using the <mentioned> element, which is already available in the content model.

Original comment by: lb42

raffazizzi commented 11 years ago

status: open --> closed-wont-fix
Priority: 5 --> 1(low)

Original comment by: lb42

raffazizzi / TEI-TEST

`<stamp>` should contain `<desc>`, not the other way round #127