music-encoding / metadata-ig

The Metadata Interest Group repository
6 stars 0 forks source link

Missing documentation: Printed Sources #8

Open riedde opened 2 years ago

riedde commented 2 years ago

explanation needed

doerners commented 2 years ago

First draft for <plateNum>, comments, suggestion, improvements etc. are welcome ;)

The dating of printed sources is a relevant factor for questions of provenance and edition. In the absence of bibliographical information, e.g. on the edition or the year of origin, plate numbers are an essential aid to dating. Though the name might suggest otherwise, plate numbers can be described as designations assigned to a resource by a music publisher which need not necessarily consist of numbers. They are usually printed at the bottom of each page of a musical print and sometimes appear on the title page as well. In MEI any such plate numbers are encoded within the <plateNum> element as plain text. <plateNum> can be captured as a child element to <physDesc> and can additionally be marked within the <titlePage> element if a plate number is visible on the title page as well. If the source is not exactly dated, it is recommended to record the plate number separately within <physDesc> in any case. For instances where any facsimile images are present, the @facs attribute for <plateNum> further allows to link to a designated location of the plate number on any facsimile as the following example illustrates:

<meiHead>
[...]
         <physDesc>
            <plateNum facs="#plateNumber">809</plateNum>
         </physDesc>
[...]
</meiHead>
<music>
[...]
      <facsimile>
         <surface>
            <graphic target="https://api.digitale-sammlungen.de/iiif/image/v2/bsb00110440_00003/full/full/0/default.png"/>
            <zone xml:id="plateNumber" lrx="1400" lry="2800" ulx="1200" uly="2700"/>
         </surface>
      </facsimile>
[...]
</music>
KristinaRichts commented 2 years ago

Hi @doerners , thank you for your suggestion, which I find very good, although I wouldn't say that it is at the coders discretion whether the plate number is captured within the physDesc or the titlePage element, although of course it makes no difference because both pieces of information are enclosed with the plateNum element. The title page can be described in great detail and therefore it is also possible to capture the plate number there. Nevertheless, I think that this information should also be captured again separately as an information element, i.e. one level higher, as a direct child element of physDesc, since it serves as an important tool for dating music prints and should be easy to find (not nested in long title page descriptions).

I wonder if we should not also point out the difference to publisher numbers in this context , which can also help as a dating aid and should be codable. For plate numbers there is this separate element, but not for publisher numbers (or did I miss something?), here you can only encode a num within the publisher element. Some projects attach importance to be able to capture the difference between these two numbers.

What do the others think?

doerners commented 2 years ago

Thank you @KristinaRichts for your comment. I think I understand the problem with the sentence you mentioned and I will think about it some more to hopefully come up with something better. If in the meantime anyone else has a suggestion this would of course be much appreciated as well :)

KristinaRichts commented 2 years ago

How about just writing: "It can be captured as a child element within the physDesc and marked within the titlePage element if it is visible on the title page." ? And perhaps in addition: If the source is not exactly dated, it is recommended to record the plate number separately within the physDesc in any case.

doerners commented 2 years ago

Sounds good to me, thank you @KristinaRichts! I edited the initial draft to incorporate your suggestion.

gucl-mu commented 2 years ago

First, thank you Sophia for your suggestion. It sounds good to me. I agree with Kristina, if there is a plate number on the title page, then it can be marked in the transcription of the title page. But at the same time, the plate number should also be deposited as structured information. I wonder if this should not rather be done in //manifestation/pubStmt. Then you would have <publisher>, <pubPlace> and <plateNum> in one place. (Presumably there is a reason why the plate number is in <physDesc> separately. Unfortunately, I do not know it. What just came to my mind is that there are many cases where reprints have only exchanged the plate number. Is there a way to specify "old" and "new" plate numbers?

rettinghaus commented 2 years ago

@doerners Thank you for this proposal. May I point out that your example is not valid. @facs should contain one or more URIs to point to something inside the facsimile section.

So you would have in the header:

<physDesc>
    <plateNum facs="#plateNumber">809</plateNum>
</physDesc>

And in the body within facsimile something like:

<zone xml:id="plateNumber">
    <graphic target="https://api.digitale-sammlungen.de/iiif/image/v2/bsb00110440_00003/1200,2700,200,100/full/0/default.png" />
</zone>

I think there is no way to describe within the element directly where the plate number is printed, but one could probably give a hint in its label:

<physDesc>
  <plateNum label="bottom center imprint">J.S.B. I. 204.</plateNum>
</physDesc>

@KristinaRichts Thanks for pointing out the possible difference of the publisher number. It would be worth noting, that often the plate number is identical to the publisher number (like in my example above). Nevertheless a plate number still is something very specific and a "normal" publisher number could go into the more generic <identifier> element. Imagine a music publisher that was taken over by another one at some point, where you then have two editions of a piece by two publishers (with diverging publisher numbers), but the plate number(s) stays the same.

doerners commented 2 years ago

Thank you @gucl-mu and @rettinghaus for your comments!

@gucl-mu: It appears to me that it is possible to repeat the <plateNum> element and encode with the attributes @xml:id as well as @precedes and @follows. Maybe those could be a means to specify "old" and "new" plate numbers? Not sure if that's the best way to go, this was just an idea.

@rettinghaus: Thank you for pointing out the problem with the example. I have to admit that I failed to check the attribute definition and the remark within the element definition itself apparently wasn't clear enough to me in that regard. Should we just replace the example with your encoding?

ahankinson commented 2 years ago

The documentation on zone / graphic / facs is a bit lacking, but the markup that @rettinghaus provided isn't quite "correct", in terms of expectations and in terms of the common practice for facsimile markup. (It is technically "valid" though).

The existing guidelines are here:

https://music-encoding.org/guidelines/dev/content/facsimilesrecordings.html#facsimileElements

Zones are typically expected to be rendered in relation to a full image, and are not intended to be simple segments of the image, but areas of interest on that image. <graphic> is usually given in relation to <surface>.

So my take on @rettinghaus' markup would be more along the lines of:

<surface>
  <graphic target="https://api.digitale-sammlungen.de/iiif/image/v2/bsb00110440_00003/full/full/0/default.png">
  <zone xml:id="plateNumber" lrx="1400" lry="2800" ulx="1200" uly="2700">
</surface>

Note that the IIIF image crop parameters are x,y,w,h, while MEI is ulx,uly,lrx,lry so those co-ordinates would need to be translated: ulx = x; uly = y; lrx = x + w; lry = y + h. The IIIF co-ordinates have been removed from the URL and placed in the zone, and the first full parameter points to the complete image.

Does that make sense? Or did I miss something?

riedde commented 2 years ago

Yes @ahankinson, you're right. I also had the problem in a project I use IIIF and MEI. So the translation is needed, but isn't this a rendering problem? Because If you like to visualize this you need to translate the coordinates and also to modify the URL to get the correct segment. Concerning the code, there is just a target to an image given and a zone that locates an area on that image using pixel cooridnates. So from my view it is okay from its content and the problem depends on the software which will used in a second step. Maybe I'm missing somthing, but I ask my self if the problem you mentioned is ours ;-) We should keep that in our minds!

ahankinson commented 2 years ago

It's not really a rendering problem. I mean, it is, but it's a bit more complicated than that.

There is a fairly common misunderstanding in how IIIF URLs are to be used with annotations. The example given by Klaus is the "IIIF Image API" style of URL. This is primarily an 'implementation detail' so to speak -- it's the type of URL that enables standardized behaviours across image servers and viewers, allowing for zooming images (so that you can dynamically chop them up and send tile requests). People then naturally extend it to be used as a basis for annotations, but it's not really how it's designed to be used.

The "correct" way to do annotation on a canvas is to use the media fragment syntax. The section on the IIIF Canvas in the Presentation API documentation talks a bit about this.

Annotations on a canvas should use the media fragment syntax, NOT the IIIF Image API syntax, and annotations are always made against the full image. This is because you're supposed to be layering annotations on top of an image -- "painting" on a "canvas" is the Shared Canvas -> Open Annotation -> Web Annotation model. (IIIF started as "Shared Canvas", then that got renamed "Open Annotation" during the development period, and the W3C standard is now officially "Web Annotation")

You can, in your implementation, translate this to the Image API URL to show just that particular region of the image, but you should actually store it (in either Web Annotations or in external documents, like MEI) assuming you're annotating against the full image. This can then be used within a Web Annotation as a target resource.

So in the end, it means that your annotations on an image resource should be serialized as:

https://api.digitale-sammlungen.de/iiif/image/v2/bsb00110440_00003/full/full/0/default.png#xywh=1200,2700,200,100

And not:

https://api.digitale-sammlungen.de/iiif/image/v2/bsb00110440_00003/1200,2700,200,100/full/0/default.png

I think this means that the MEI encoding I provided above is closer to the intent of IIIF. It divides the two concerns. You can also see another benefit of this: In the second URI, technically any change in the image parameters is an entirely different "Resource", in the HTTP sense. This means that, even though the are the same target, the "Web" understands them as being completely different entities.

In the first (correct example), however, the query parameters simply identify a "sub-resource" of the same image. In MEI language this means you can have many different zones on the same surface. In a graph database serialization, you can tie together the same URIs and then serialize just the zones as an additional set of triples against that URI.

If we were to be even more pedantic, technically the MEI surface would actually correspond to the IIIF canvas, and you would use the URIs of those directly, instead of the actual image. But that's even harder, because it's not always possible to tie a canvas to an actual image unless you also parse the full manifest. But I could probably talk for several hours about this. :-D

riedde commented 2 years ago

That clarifys much to me! I also became part of that misunderstanding. But now I can absolutely support the use of the media fragemnt syntax. Thank you for that deep insight!

doerners commented 2 years ago

I edited the initial draft and especially changed the example in light of the comments here – thank you all for the help! I would like to ask everyone to have a look again to check if the updated version might now be suitable for inclusion into the guidelines.

gucl-mu commented 2 years ago

Honestly, I don't think we need the link to <facsimile> in the example. What is genuine about it for plate numbers? That's something that should be explained elsewhere in the guidelines (Although we could keep the example). And another question, you specify the plate number in <item> but don't all manifestations of a print have the same plate number? It seems to me more important to make the distinction @KristinaRichts mentioned between plate number and publisher number or explain how we can enocde it right now with the existing elements

doerners commented 2 years ago

@gucl-mu you're absolutely right regarding <item>...I took it out.

Side note: Does anyone know why <item> is the only parent element listed in the definition of <physDesc>? This could lead to confusions, I think.

Personally, I thought it to be important to have an example showing the use of @facs, because in the element definition of <plateNum> the remark says:

The @facs attribute may be used to record the location of the plate number in a facsimile image.

To me the remark was not clear enough as to how this translates into a specific encoding, because I haven't encountered it before. If someone who is new to MEI wants to encode a plate number, such an example might be useful and maybe clearer than the remark.

Regarding the publisher number: Especially @rettinghaus comment lead me to think that this might also be encoded within an <identifier> element. Or as @KristinaRichts pointed out within a num inside the <publisher> element. I'm not saying I'm opposed to the idea to point out the differences within the <plateNum> description, but it wasn't entirely clear if that's what we want to do. I think that's something we could talk about in one of our IG meetings?

KristinaRichts commented 2 years ago

Side note: Does anyone know why is the only parent element listed in the definition of physDesc? This could lead to confusions, I think.

That's a good point, @doerners . Of course, manifestation must definitely be added to that.

I think it's okay if we leave @facs in the description. It could be the case that someone really wants to describe the location of the plate number so precisely, even though I personally think there are more important things. Anyway.

With regard to the differentiation between plate number and publisher number, the problem is that there is no separate element for recording the publisher number. Therefore, a description of the preferred encoding of the same will hardly ever be given elsewhere. For this reason, I would suggest that we elegantly build this into the description and perhaps say something like: "Unlike the publisher number, which is captured using an identifier element in the pubStmt of a manifestation (##example##), plate numbers are encoded within the plateNum element as plain text..."

@rettinghaus I think your idea of putting the publisher number in an identifier is much better. I was mentally too much fixated on the "number" and did not have that in mind. But I would definitely recommend to assign an identifier here.

However, the question of whether the plate number should be captured within physDesc or within pubStmt is something we should perhaps discuss again in a larger setting. There are quite different opinions on this, as I keep noticing.

What just came to my mind is that there are many cases where reprints have only exchanged the plate number. Is there a way to specify "old" and "new" plate numbers?

Wouldn't a reprint warrant a new manifestation, which would then include the changed plate number, @gucl-mu? I know, one tends to rather take the "short way" when only the plate number changes ;-)

doerners commented 2 years ago

Here is a reworked proposal for the description of <plateNum> I would like to discuss during the next IG meeting on 2022-08-11:

The dating of printed sources is a relevant aspect for questions of provenance and edition. In the absence of bibliographical information, e.g. on the edition or the year of origin, plate numbers are an essential aid to dating. Though the name might suggest otherwise, plate numbers can be described as designations assigned to a resource by a music publisher which need not necessarily consist of numbers. They are usually printed at the bottom of each page of a musical print and sometimes appear on the title page as well. In MEI any such plate numbers are encoded within the element as plain text.

In this context, it is worth noting that there are cases where the plate number for a particular print is identical to the publisher number of the music publisher responsible for that print. However, in certain circumstances, especially when a music publisher has been taken over by another at some point, it can be the case that two editions of a piece have the same plate number, but the publisher numbers differ because the acquiring publisher has continued to use existing plates. As for the distinction between plate number and publisher number, MEI does not provide a specific element for recording a publisher number. It is recommended to capture each publisher number with the <identifier> element within <pubStmt> at the manifestation level, as the following example shows:

<manifestation>
   <pubStmt>
      <publisher>
         <identifier type="publisher number">123456</identifier>
      </publisher>
   </pubStmt>
</manifestation>

In contrast, any plate number will be encoded within the <plateNum> element. <plateNum> can be captured as a child element to <physDesc> and can additionally be marked within the <titlePage> element if a plate number is visible on the title page as well. If the source is not exactly dated, it is recommended to record the plate number separately within <physDesc> in any case. For instances where any facsimile images are present, the @facs attribute for <plateNum> further allows to link to a designated location of the plate number on any facsimile as the following example illustrates:

<meiHead>
[...]
         <physDesc>
            <plateNum facs="#plateNumber">809</plateNum>
         </physDesc>
[...]
</meiHead>
<music>
[...]
      <facsimile>
         <surface>
            <graphic target="https://api.digitale-sammlungen.de/iiif/image/v2/bsb00110440_00003/full/full/0/default.png"/>
            <zone xml:id="plateNumber" lrx="1400" lry="2800" ulx="1200" uly="2700"/>
         </surface>
      </facsimile>
[...]
</music>
gucl-mu commented 2 years ago

First of all, thanks again Sophia, for revising the draft! Overall, I find the proposed encoding a bit inconsistent. On one side a syntactic sugar element like <plateNum> and on the other side an <identifier> with @type/@label. Also very precise information about the linking of the page but no indication of which publisher it is. Maybe by the next meeting we can all still look for "real" examples. We should also definitely still discuss the nesting in <publisher>. There is a publication by Stephane Buchon in which you can find a lot of acquisitions of the U.E., maybe that helps.

lpugin commented 2 years ago

@gucl-mu may I ask you to add your name to your profile? We are a small community, and it is always nice to know who we are talking to. Thanks! Laurent

doerners commented 2 years ago

@gucl-mu I agree that the encoding can most definetly be improved and that we should discuss the other points you made!

So if anyone comes across a nice (i.e. real) example, please bring it to the next meeting or just drop a comment here :) It would most definetly be much appreciated.

ahankinson commented 2 years ago

As a matter of practice, MEI generally avoids the use of @type in any specific way, simply because it is very easy to over-use it, and this makes it much harder to validate the values of it. If every element had custom values for @type then we would need custom validation rules on every element, and it would be harder to keep track of them. So you generally try to choose a specific attribute that covers the use case, instead of falling back on generic names like type or class. There are exceptions, but they are few.

Simple values like type="publisher number" on a generic element like <identifier> make this data specific to the encoding and not interchangeable (that is, everyone would have to know and use this specific string in order to accurately parse the data and understand what kind of identifier it is). It would be better to use auth and auth.uri to typify this data according to some vocabulary, e.g., http://www.rdaregistry.info/Elements/m/#P30066. This allows you to unambiguously specify that this is a plate number. Or, you can specify your own classification and use @class to classify the type of identifier you want.

Incidentally, using the RDA Registries also let you differentiate between plate and publisher numbers:

So you don't necessarily need to embed <identifier> inside of <publisher> for it to make sense; you can put both on the manifestation, and use the classification scheme to differentiate them.

(There is also an issue with using publisher number as the value of @type, since this is an NMTOKEN which means that parsers will interpret this as two values, "publisher" and "number", instead of one complete string. The type documentation for identifier gives a hint about this).

doerners commented 2 years ago

@ahankinson could you give an encoding example of how the encoding of a publisher number might best be realized? That would be helpful, thanks.

ahankinson commented 2 years ago

"Best" is up to the encoder; I can really only provide additional perspective on the existing tools within MEI that I don't see in evidence in your example.

But, to take your example from above, if all you wanted to do was capture the publisher number, then it might look something like this:

<manifestation>
   <identifier label="Publisher Number" auth.uri="http://rdaregistry.info/Elements/m/P30065">123456</identifier>
</manifestation>

That is, there is no need to embed this in the <publisher> block if you just want to scope the <identifier> element to a publisher's number; the example above gives a unique identifier for the statement "This manifestation has a publisher number identifier of 12345". I mean, you can embed it, but you don't need to. This also fixes some issues with using the @type attribute.

This is in response to your observation that there is "no" publisher number in MEI -- it's true that there is no specific element for it, like plate numbers, but that doesn't necessarily mean you can't capture it. You might even choose to not use <plateNum> as well, and standardize on the use of the <identifier> tag for plate numbers.

doerners commented 2 years ago

Personally I think it is counterintuitive to use label instead of type. In my understanding, label means the name we give to an object, type, on the other hand, specifies the nature of the object. What would a correct encoding using type instead of label look like? In my mind I would like to say: Here is an identifier and it's a specific type of identifier, namly a publisher number (and not any other identifier such as a doi for example). Maybe it's just me but label doesn't seem to express the same semantic content.

Furthermore, wouldn't it be better to nest the publisher number within publisher to designate to which publisher this number is assigned? I mean, yes, the source may be the object from which we recognize the publisher number but isn't it rather primarily an identifier for the corporate body than the source? Since, according to my example, it is all encoded at the manifestation level anyway, the connection to the manifestation is established, is it not?

ahankinson commented 2 years ago

What would a correct encoding using type instead of label look like?

That's the thing -- there is no "correct" encoding using @type. It's a free-form field, and the values are not controlled. There is no externally-recognized classification for this attribute, so encoders can put anything they want in there, by design. That's fine if your data is only going to be used by you, but it's not that great if you need to communicate unambiguously with others. And in MEI, it's generally avoided to put any sort of fixed values within @type since it can very, very easily become complicated and abused, as I explained above.

In other words, your example was "correct" in the same way that doing <identifier type="flibflub"> is "correct" -- it can't actually be wrong! It might mean something to me, but it could very easily be nonsense to others.

In my mind I would like to say: Here is an identifier and it's a specific type of identifier, namly a publisher number (and not any other identifier such as a doi for example). Maybe it's just me but label doesn't seem to express the same semantic content.

That's what my example gave -- You can remove @label and it would mean exactly the same. I'm assuming, though, that you'll want some sort of human-readable version in your encoding, which is why I included it. The Publisher Number classification is given by the @auth attribute, and it unambiguously specifies that 123456 is a publisher number.

DOIs are primarily about resolving content to a specific representation. The RDA Vocabularies are about unique identifiers for shared concepts: If I see that given URI, I know that it unambiguously refers to the concept of a publisher's number. More importantly, if other people find my data and want to parse it, then they can unambiguously look up what that concept means.

If we're just doing "classification by string" then it's a lot harder to share data. You might use the string "publisher number"; someone else might use "pubnum"; someone else might be "publisher_identifier", etc. In heterogenous datasets, the variations of the different concepts and methods of classifying stuff is what really makes data non-interoperable. So shared vocabularies are really helpful.

Furthermore, wouldn't it be better to nest the publisher number within publisher to designate to which publisher this number is assigned?

Sure -- I mentioned that you can embed it in there if you want; I was mostly going on your encoding where there was no other information about publishers given. There, it seemed that the only reason to scope it inside of a <publisher> block was to add context. If there is no other information given in the <publisher> block, then you can remove that tree and it would mean exactly the same.

doerners commented 2 years ago

Thanks to all for the comments here and to everyone who attended out last IG meeting. I have reworked the draft for the description of <plateNum> again and tried to incorporate the suggestions made.

Comments and also critique are welcome!

However, I'd like to make a little request: If there is anything you'd like to improve, please provide an actual suggestion on how to phrase or encode something differently, rather than just pointing out what could be problematic. I feel this would especially facilitate a faster improvement process and make it easier to incorporate any changes. And of course credit will be given where credit is due ;)

Here the reworked draft:

The dating of printed sources is a relevant aspect for questions of provenance and edition. In the absence of bibliographical information, e.g. on the edition or the year of origin, plate numbers are an essential aid to dating. Though the name might suggest otherwise, plate numbers can be described as designations assigned to a resource by a music publisher which need not necessarily consist (only) of numbers. They are usually printed at the bottom of each page of a musical print and sometimes appear on the title page as well. In MEI any such plate numbers are encoded within the <plateNum> element as plain text.

In this context, it is worth noting that there are cases where the plate number for a particular print is identical to the publisher number assigned by the music publisher responsible for that print. However, in certain circumstances, especially when a music publisher has been taken over by another at some point, it can happen that two editions of a piece have the same plate number, but the publisher numbers differ because the acquiring publisher has continued to use existing plates. As for the distinction between plate number and publisher number, MEI does not provide a specific element for recording a publisher number. It is recommended to capture each publisher number with the <identifier> element within <pubStmt> at the manifestation level. In contrast, any plate number will be encoded within the <plateNum> element. <plateNum> can be captured as a child element to <physDesc> and can additionally be marked within the <titlePage> element as well, if a plate number is visible on the title page. For instances where the source is not exactly dated, it is recommended to record the plate number separately within <physDesc> in any case. The following encoding example illustrates the use of both <plateNum> and <identifier> for encoding a plate number and a diverging publisher number of a resource:

      <manifestation>
         <pubStmt>
            <publisher>
               <corpName auth="GND" auth.uri="https://d-nb.info/gnd/2045143-X">Universal Edition</corpName>
               <identifier type="publisher_number" auth="RDA" auth.uri="http://rdaregistry.info/Elements/m/P30276">1967</identifier>
            </publisher>
         </pubStmt>
         <physDesc>
            <plateNum>1913</plateNum>
         </physDesc>
      </manifestation>
ahankinson commented 2 years ago

<identifier type="publisher_number" auth="RDA" auth.uri="http://rdaregistry.info/Elements/m/" codedval="P30276">1967</identifier>

The documentation for <identifier> says:

This attribute may contain a complete URI or a partial URI which is completed by the value of the codedval attribute.

(Here's me pointing out a problem again!)

There are two problems with splitting the values across auth.uri and codedval:

1) No single field provides a nice, easy place to read the value from for processors and 2) The practice of splitting a URI across two different attributes is very MEI-specific -- not many other specifications, as far as I know, suggest this.

I don't actually see much benefit from splitting the full identifier between auth.uri and codedval. So I might suggest taking the option to put the full URI in the auth.uri attribute.

<identifier type="publisher_number" auth="RDA" auth.uri="http://rdaregistry.info/Elements/m/P30276">1967</identifier>

The same would follow for <corpName>:

<corpName auth="GND" auth.uri="https://d-nb.info/gnd/2045143-X">Universal Edition</corpName>

Two other comments:

  1. You provide a value for type but don't describe it in your prose. Is this expected to follow some sort of typology, or can it actually be omitted? If you do keep it, you should probably have a sentence describing it.
  2. It's just a thought, but what do you think about this?
<plateNum>
    <identifier type="plate_number" auth="RDA" auth.uri="http://rdaregistry.info/Elements/m/P30066">1913</identifier>
</plateNum>

On the one hand, it's slightly redundant. On the other, however, it does allow for the unification of identifier capture, which would make parsing and understanding the content of an MEI file easier, particularly for metadata systems. Since <plateNum> can't take the auth and auth.uri attributes directly, this would probably be the next best thing for ensuring a consistent way of referring to identifiers.

ahankinson commented 2 years ago

One other thing that occurs to me about splitting codedval and auth.uri:

If they are split, you're never really sure if the auth.uri is complete or not. Since identifiers are "Opaque" (they may not resolve, and they don't necessarily contain any semantics in the string value) it's very hard to automatically determine whether http://rdaregistry.info/Elements/m/ is actually the full identifier or just a fragment.

So making it "best practice" to always include the full identifier is probably easier for everyone.

doerners commented 2 years ago

Thanks @ahankinson! I edited the draft so that the URIs are not splitted anymore.

Question: If it is better to include the full URIs rather than splitting the values why does MEI offer codedval in the first place? Maybe there is a use case where this is needed that just illudes me? If that is not the case, maybe it's worth discussing within the technical team if there should be something done to set using full URIs within auth.uri as default?

Two other comments:

  1. You provide a value for type but don't describe it in your prose. Is this expected to follow some sort of typology, or can it actually be omitted? If you do keep it, you should probably have a sentence describing it.
  2. It's just a thought, but what do you think about this?
    <plateNum>
    <identifier type="plate_number" auth="RDA" auth.uri="http://rdaregistry.info/Elements/m/P30066">1913</identifier>
    </plateNum>

    Regarding 1.: type could be omitted, but I felt including it would serve human-reabability. I think describing it would definetly not do any harm, however, I'm not sure if this might shift the focus away from what this example wants to illustrate in the first place, i.e. how to encode <plateNum>. Whether you use type and how thoroughly you describe it should be up to any edition guidelines, shouldn't it? However, I'm not saying that I'm generally opposed to this suggestion and will think about it some more.

Regarding 2.: In general I'm in favour of a unification of identifier capture. However, nesting identifier within plateNum seems not quite elegant. In my mind I have to ideas of how to achieve this without redundancy in the encoding:

  1. Just use the identifier element
  2. create new specific elements for other identifiers and see that the modelling of all those elements allows the direct use of auth and auth.uri

Idea 1 would be the question why there if plateNum is even needed as a specific element. Idea 2 would result in changes to the schema. For both ideas I see a need for discussing this further with the community. Which is not a bad thing. However, it begs the question if it would be smarter to hold off creating documentation for the guidelines or to document things as they are right now and change them later. I'm not sure what is the smartest move right now.

ahankinson commented 2 years ago

If it is better to include the full URIs rather than splitting the values why does MEI offer codedval in the first place

I'm not sure. I suspect it's needed if you only have an identifier that isn't in URI form -- the MARC Codes for Organizations is one such example that I've come across, where it has an "Authority" and a "value" but no canonical URI form. So for example, QMMMDM is the MARC org code for the Music Library at McGill University. In that case you might have:

<corpName auth="MCO" codedval="QMMMDM" auth.uri="https://www.loc.gov/marc/organizations/">Marvin Duchow Music Library</corpName>

(I'm totally making this up -- it shouldn't be used as an actual example of proper encoding!)

In this case you can't just stick the codedval on the end of the auth.uri to actually get a full URI. The URI, however, does uniquely identify it as the "MARC Codes for Organization" scheme, something which "MCO" alone in auth doesn't.

Another example might be where the URI and the value can differ. So, for example, the same library in RISM has a sigla of CDN-Mm -- this might be expressed as:

<corpName auth="rism" codedval="CDN-Mm" auth.uri="https://rism.online/institutions/30000472">Marvin Duchow Music Library</corpName>

In this case the codedval and the full auth.uri are both correct, but different.

type could be omitted, but I felt including it would serve human-reabability.

label would probably be a better attribute to use for human readable purposes; it doesn't have the same restrictions as type on the value (with type any value with a space is considered two values; label doesn't have this problem). As it is now, @type looks like a pseudo-authoritative value, so someone might start thinking that publisher numbers must have this in order for it to be valid, when that isn't the case.

I'm not sure if this might shift the focus away from what this example wants to illustrate in the first place

That's what I was thinking. If you want to illustrate something specific, it's probably better to omit the things you're not prepared to describe. That way people don't come away wondering why type is there...

As for the <plateNum> I'm actually in favour of your option 1 -- just use identifier directly. But I didn't want to overstep; what you have is perfectly valid.

it begs the question if it would be smarter to hold off creating documentation for the guidelines or to document things as they are right now and change them later.

I don't think you should hold off. MEI has a lot of options, and about twenty different valid ways to do the same thing. I think it would be best to narrow it down to a single reasonable recommendation using the available elements, instead of waiting for the perfect schema to emerge.

rettinghaus commented 2 years ago

auth.uri is used to specify the content of the element, not the element itself. Mixing these things up is extremely bad practice and should be avoided by all means.

This example from @ahankinson leads the reader/computer to believe that they can find more information about the number 123456 by looking up the URL:

<identifier label="Publisher Number" auth.uri="http://rdaregistry.info/Elements/m/P30065">123456</identifier>

The MEI way to describe an elements relation to another standard would be using @analog:

<identifier label="Publisher Number" analog="rda:P30065">123456</identifier>

Furthermore I think @type is a valid way to make an example in the Guidelines more descriptive to the human reader. The general implications of using @type should be explained elsewhere and not in an example for plateNum.

Saying this, I would propose to change the example given above slightly into:

<manifestation>
    <pubStmt>
        <publisher>
            <corpName auth="GND" auth.uri="https://d-nb.info/gnd/2045143-X">Universal Edition</corpName>
        </publisher>
        <identifier type="publisher_number" analog="rda:P30276">1967</identifier>
    </pubStmt>
    <physDesc>
        <plateNum analog="rda:P30066">1913</plateNum>
    </physDesc>
</manifestation>
ahankinson commented 2 years ago

auth.uri is used to specify the content of the element, not the element itself.

@rettinghaus I'm not sure that's entirely correct. The documentation for @auth.uri says:

A web-accessible location of the controlled vocabulary or other authoritative source of identification or definition for this element or its content

(emphasis mine).

That says to me that it can be either an authority for the element OR for the content of the element. In the specific case of identifier, I think this would mean that it would define the identifier as being one of a publisher or plate number.

Using @analog in the manner you suggest comes with the challenge that you're specifying a namespace prefix, rda:, within an attribute value. While not incorrect (insofar as you can put any value in there, since @analog accepts plain text), it also does not provide a way of expanding the rda: value to its proper full namespace value. (Unlike using it in the attribute or element itself, e.g., @mei:analog.)

Further, looking at the att.bibl class remarks (where @analog is defined), I think it's clear that this is not its intended usage.

Mapping elements from one system to another via analog may help a repository harvest selected data from the MEI file to build a basic catalog record. The encoding system from which fields are taken must be specified. When possible, subfields as well as fields should be specified, e.g., subfields within MARC fields.

From https://music-encoding.org/guidelines/dev/attribute-classes/att.bibl.html

This suggests to me that the @analog value is to provide a way of mapping in to MEI by specifying an "analaogous" element... So a title in MEI might have an analog value of something like @analog="240$a" (mapping the title to the MARC21 240 $a field). I don't think the purpose of it is to actually map out to other specs.

gucl-mu commented 2 years ago

Thanks Klaus and Andrew for the explanations about @auth.uri and @analog. Even though the discussion about linking to controlled vocabularies is very important, I think it is somewhat misleading for readers if the example to <plateNumber> or PublisherNumber contains links to controlled vocabularies. As Klaus points out, this should be explained elsewhere in general. I think a more "simple" example with @label="publisher_number (I know @label doesn't require NMTOKEN, but then it wouldn't be @type and a pseudo-authority, on the other hand it would be better machine readable) would be sufficient for this section. I think we should not forget what influence the guidelines have on the users. If there is an example with @type or @label, then probably 99% of the users will encode exactly the same way, without thinking much about which attribute means what. However, maybe it would be good to have a "linked" example available in the sample encodings repo?

doerners commented 2 years ago

Since the next IG meeting is close, here a new reworked draft for the description of <plateNum> to discuss:

The dating of printed sources is a relevant aspect for questions of provenance and edition. In the absence of bibliographical information, e.g. on the edition or the year of origin, plate numbers are an essential aid to dating. Though the name might suggest otherwise, plate numbers can be described as designations assigned to a resource by a music publisher which need not necessarily consist (only) of numbers. When present, they are usually printed at the bottom of each page of a musical print and sometimes appear on the title page as well. In MEI any such plate numbers can be encoded within the <plateNum> element as plain text, similar to: <plateNum>A & P. No. 6412</plateNum>.

In this context, it is worth noting that there are cases where the plate number for a particular print is identical to the publisher number assigned by the music publisher responsible for that print. However, in certain circumstances, especially when a music publisher has been taken over by another at some point, it can happen that two editions of a piece have the same plate number, but the publisher numbers differ because the acquiring publisher has continued to use existing plates. As for the distinction between plate number and publisher number, MEI does not provide a specific element for recording a publisher number. For now it is recommended to capture each publisher number with the <identifier> element. In contrast, any plate number should be encoded within the <plateNum> element. <plateNum> can be captured as a child element to <physDesc> and can additionally be marked within the <titlePage> element as well, if a plate number is visible on the title page. For instances where the source is not exactly dated, it is recommended to record the plate number separately within <physDesc> in any case.