Clarify methodology when encoding incipits

BaMikusi commented 2 years ago

In accordance with RISM's general policy that cataloguers should reflect what they see on the source, a number of records include sharp signs in positions where we would today use a natural. See e.g. https://muscat.rism.info/admin/sources/452507507

This causes no particular difficulty for the (sufficiently intelligent) human user, but I wonder whether the IT side can also handle such irregularities when searching for incipits or comparing them. If not, we should amend the Muscat guidelines (and potentially the PAE specs) to explicitly state that such cases should be normalized.

ahankinson commented 2 years ago

While I don't disagree with that policy, I think it's more about "the right place" to put such observations.

I see two types of data here: The "operational data" (like PAE) used by the systems to support finding, rendering, grouping, etc. (aka "Entry points"), and the data used to serve as a proxy for the physical object by describing it in detail for the readers of the sources.

The problem arises when we try to treat the same field as both operational and descriptive data. Entry points need to have a well-defined set of values, so that we can group and filter and render correctly on those values. Descriptive data needs to be accurate in its description, whether or not it fits into a "standard" set of values. We can look for keywords in descriptive data, but other than that we can't use it "operationally". Trying to serve both at the same time is counter-productive, since it means we are either:

Standardizing values for descriptive data that are not actually standardized and poorly reflect the actual source, or
Making operational data so complex that it makes it difficult for anyone to find or work with the data consistently (See, for example, the myriad variations on place names: All are correct, from a human standpoint, but frustratingly no one entry gives you the full picture of what, actually, is related to the place "Freiburg in Breisgau" because it was transcribed from the source directly).

So, this is all just a long-winded way of saying that:

We should constrain the operational data used for filtering / rendering in order to make the behaviours of the software more consistent and predictable, since this ultimately serves our users better (nobody likes find bugs or working around wonky behaviours, and the fewer variations we have to deal with, the fewer bugs we have). Otherwise we will have "Complexe and Difficult" rather than "Plaine and Easie". 😄
Observations about the original source should be made available to the human readers of such records, since they are the primary audience of these observations. A simple note such as "Old-style x symbols were present instead of natural signs" is both highly descriptive (and can be endlessly expanded), and available to the appropriate audience.

So ultimately my feeling on this issue is that for PAE we choose to represent "n" as "n", and "x" as "x", and if there are interesting variations then we describe them in as much, or as little, detail as we think anyone would want in a note field.

BaMikusi commented 2 years ago

Thanks for the comment, which actually fully coincides with my approach. For background: we discussed an incipit with Jennifer, and in the process I half-automatically amended such an x to n, whereupon Jennifer reminded me that our guidelines in fact specifically urge cataloguers to keep such irregularities. But I also think that, when it comes to a PAE code, normalization is inevitable, not just with respect to these signs but also to missing clefs -- since the cataloger already made up her/his mind to some extent, since the PAE includes pitches, it makes no sense to leave the field empty, thus leaving the user without a point of orientation. Better add one, and a note that it has been supplied (or, by the same token, a note that the original x in the source has been rendered as n).

jenniferward commented 2 years ago

Would the old records be changed? And would standardization also apply to mensural notation?

ahankinson commented 2 years ago

Standardization would apply to mensural notation insofar as we determine what the most important aspects of the notation are for this field, which is in turn determined by how we want it to function within the data. My opinion is that the function of this field is to serve as a means of melodic identification, and so accurate transcriptions of pitch and rhythm, and the components that describe or modify those (clefs, mensuration, key signatures, etc.) are the most important attributes to capture.

Of the three styles (modern, mensural, and neumes) that PAE supports, the pitch encoding is the same, it's only the approach to meter that differs. So we could say that the reason we switch note-head styles isn't so much that we're trying to visually represent what is in the source ("diplomatic transcription"), but rather to signal that the metric system is different. (This is a very fresh way of thinking about this, and I'd have to ponder on it a bit longer -- counter-examples welcome.)

Put another way: If the same melody is written in mensural notation in one source, and in modern notation in another, we want to be able to find both of these with the same query. Differentiating between them on the basis that they look different does little to help us realize that they sound the same, so we should prioritize the attributes that promote these sorts of matches in helping us to identify different pieces with the same tune. (And also, we should remember that a diplomatic transcription in PAE is, at best, a very poor visual representation of the original, since there are so many other aspects that we can't represent: fonts, spacing, color, typeface, paper texture, special symbols, etc.)

Futhermore, when it comes down to it, PAE is not designed for diplomatic transcription, not least because such a thing is very difficult in any music encoding system (MEI struggles with this as well, and it is orders of magnitude more complex as far as encoding systems go). If we try to do diplomatic transcription in PAE, we then need to deal with the other fundamental variations in the practice, e.g., ars antiqua vs. ars nova, or support colouration, etc., and then we're basically reinventing MEI.

As an addendum, the TEI uses the incipit almost exclusively as an identifier of the source text:

incipit contains the incipit of a manuscript or similar object item, that is the opening words of the text proper, exclusive of any rubric which might precede it, of sufficient length to identify the work uniquely; such incipits were, in former times, frequently used a means of reference to a work, in place of a title.

Likewise, the MARC21 documentation that the incipit's primary function is for identification:

031 Primarily used to identify music manuscripts, but can be applied to any material containing music.

Given that we also have the ability to attach MEI encodings in Muscat, which can be much more expressive than PAE, then I think the more we can limit ourselves to the melodic identification function with PAE, the easier it will be for "future us" to manage this data and make our sources more findable.

BaMikusi commented 2 years ago

Regarding your hint at MEI, I need to observe that "ars antiqua vs. ars nova, or colouration" has not really been RISM's main concern earlier on, so any adjustment regarding our cataloguing policies must also consider to what extent we actually mean to expand the core coverage.

And more importantly (returing to Jennifer's first question), I wonder if it is realistic to retrospectively change this practice in the old data. I guess we should be looking for incipits with flats in the key signature, where one of these flatted pitches occurs with a sharp in the PAE string later on. And, conversely, I guess we should also be searching for key signatures with sharps that later on feature a flat (rather than a natural) sign before one of the affected pitches (though I am unsure whether this practice was equally wide-spread). How many of these could one find? Hopefully not too many, since these are certainly all passages to be looked at by an editor, rather than simply by a script.

But the bottom line for me is that there is quite surely some need to reconsider the "exactly reflect the source" policy, but any amendment must be done not by improvisatively altering the guidelines here and there, but rather by formulating a comprehensive rule of thumb to determine in which cases literal transcription is valuable and encouraged, and where standardization should prevail.

ahankinson commented 2 years ago

I have updated the title to reflect the current state of the discussion. I think beyond the "historical use of x in place of n" that the discussion here has identified a larger issue, which is to describe the purpose and capacity of PAE to represent music notation.

This should probably be included in the preamble to "Version 2" of the specification.

ahankinson commented 2 years ago

I'm going to move this to the "Encoding Guidelines" tag so we can mine it for content later.

rism-digital / pae-code-spec

Clarify methodology when encoding incipits #30