w3c / mnx

Music Notation CG next-generation music markup proposal.
173 stars 19 forks source link

immutable and augmentable UIDs #65

Open jsutdolph opened 6 years ago

jsutdolph commented 6 years ago

Sorry to hammer away at this one, but it seems to me to be a valid requirement. An app may need to create a separate file which refers to elements in the score in such a way that the score may be edited in any way and the unchanged separate file still refers to the same elements. We might at first think to identify something eg as 'the 1st notation in the 2nd note of the 3rd bar of the 4th part' . However if a part or bar or note or notation is added or removed before this the identifier no longer refers to the same item. I propose that to do this we require immutable unique identifiers, for which we also need to have a rule for creation (a newly created identifier must be unique). The obvious way is to define a special format for immutable identifiers. A simple numeric value would seem to be the only sane way to define this. On reading the file the editor would set a counter to the highest id number found. To create a new id when creating a new element it would add 1 to this counter. This immutable-id type would be a special attribute(?) of any(?) element, in addition to the normal UID which can use any format.

jsutdolph commented 6 years ago

It occurs to me we will need to store the next id value in the score. If the highest numbered id is used, and then the element with that id is deleted then using the method described above the id will be reused. If another file refers to that id then there will be an error as it will be expecting this to refer to the old deleted element. We must ensure that an id is never reassigned with the same value.

joeberkovitz commented 6 years ago

@jsutdolph It's fine to come back around on this, and I did ask you to file an issue for MNX so it's good to have this on record.

Immutable IDs, if they are included in the standard, will increase the implementation burden on MNX developers. So we do need clear use cases that show that immutable IDs (on top of regular XML IDs) confer clear advantages on producers and consumers of MNX, in order to have a reasonable review and discussion of this issue.

Can you please provide a real-world scenario that describes a sequence of concrete actions by one or more users, making use of various hypothetical applications, and which shows exactly how some set of goals are uniquely supported by immutable IDs. Likewise with the creation rule -- it is not clear what this makes possible and why it's necessary, given some kind of immutability. We need a compelling play-by-play example, since not everyone's going to agree up front that we should add this ingredient to MNX (and may not agree afterwards either).

I think it's also necessary to clarify "[such that] the score may be edited in any way and [a set of external IDs] refer to the same elements". Is it possible to state this need in terms of what happens when arbitrary applications produce and consume MNX, rather than in terms of the score "being edited"? There will be many applications that read and write MNX and only some of them will be editors. Imagine an application that reads an MNX piano score, and algorithmically creates a choral arrangement. Which elements were "edited", and which were not?

Note for others that a preceding discussion went over some (but not all) of this territory for MusicXML: https://lists.w3.org/Archives/Public/public-music-notation-contrib/2017Sep/0008.html

jsutdolph commented 6 years ago

Transforms: The case for Invariant Unique Identifiers in MNX

I propose a mechanism for describing generic transformations or transforms to an MNX score. I will then show that this powerful mechanism requires a special type of unique identifier to be optionally applied to elements, and the requirements for this special identifier cannot be fulfilled by the existing XML ID.

I will start with an example which we are all familar with as an application for transforms

Annotation of music for performance

Any musician will say that it is essential to be able to freely annotate music for performance, whether to add dynamics, articulations and text or to correct errors in the score

The current situation

In the current paperful world each musician makes their own annotations in pencil. The director/conductor reads out all his/her annotations and every member laboriously adds these in their copy. Also the section leader will usually have some annotations to add which need to be copied into all copies for that part. There might be other levels of sharing too. If somebody is absent for a rehearsal then they need to borrow somebody's copy to add the missing annotations.

Problems.

Annotation in the paperless world - a 'dream' scenario

Transforms - A way to achieve the 'dream' scenario

Imagine that you could describe a small change to a score in an unambiguous way. This description is called a transform.

Now you wrap a set of transforms and store them separately from the score together with any convenient metadata such as author and date/time (for instance in an XML or JSON file).

Now imagine a viewer application which takes the unmodified source document and a set of transforms and allows you to visualise the effect of applying these transforms, perhaps colouring ones by different authors, or showing the transforms in a time-line

By storing a score and separate transforms we see that the original document is always available for comparison. We can see who made what changes, and we can reverse (undo/redo) any changes.

Let us assume that many changes such as insertion, deletion or modification of a single element can be simply described by transforms, but not all possible changes to a score can be conveniently or economically described, such as addition of a new part or insertion of many bars. For this reason it is important to allow the source document to be modified and still make a best effort to ensure that any existing transforms remain valid, and if any fail we ensure that the failure can be correctly detected, described and ideally located to a particular measure.

[Aside. This seems like a description of a scriptable score editor, but we are going further by allowing packaged sets of transforms which can be independently applied. We are defining the transforms in a way that will allow them to be standardised, and we are storing metadata in the transforms]

Now it should be clear that transforms can be used for the annotations described in the 'dream' scenario.

Transform implementation

The transform will at least need to describe:

  1. How to insert, remove or modify elements.

  2. How to move certain elements such as wedge ends

  3. The unambiguous identity of an existing element of the MNX score.

  4. An unambiguous position in the score to insert a new element. In all cases this will be relative to existing elements eg 'on note x' or 'at the start of bar y'

The transform format could be eg XML or JSON, and it could perhaps become an addendum to the MNX standard.

Element identification

An essential part of this implementation is a reliable means of linking the transform descriptions unambiguously with elements in the MNX score, while allowing the original score to be modified directly by editing the MNX file, or indirectly by application of previous transforms

We could refer to the element by part index, measure index, and index of element type in the measure. eg

<insert at="part1-measure34-note23"><dynamic type="sfz"/></insert>

Problems.

* If the original score is modified and a measure or part or note is inserted or deleted before this then this transform will end up in the wrong place, and the error will not be detectable.

All MNX elements can have an optional XML ID field. This is defined to be unique and so it could be used for for identifying the element.

Problems.

* IMPORTANT If we have a score containing XML IDs on some elements created by some haphazard rule, and this score requires these references for internal or external consistency, then we have no way of using these IDs for our scheme, because there is no convenient way to add new unique IDs to this existing set, not knowing which 'haphazard rule' was used to create them. Unless we have a fixed rule for generating new unique IDs we cannot add any to a score which contains existing IDs which need to be preserved.

When the score is created or prepared for use with transforms then all addressable elements (TBD) including parts and measures are given a unique IUID and a Predetermined Generation Rule (PGR) is used for creating these IUIDs. If the score is subsequently modified or transformed then a) any elements which are modified must retain the same IUID, and b) any new elements are given a new IUID using the PGR.

eg

<mnx-transforms source="iuid.be478726" by="jdoe" email="jdoe@doe.co">

<insert id="iuid.34560.23def" at="iuid.23B456.23def"><dynamic type="sfz"/></insert>

<insert id="iuid.34789.23def" from="iuid.C234.23def" to="iuid.C239.23def"><wedge type="crescendo"/></insert>

<modify at="iuid.12B.23def" from="iuid.122.23def" to="iuid.134.23def"><wedge></modify>

<modify at="iuid.6573.23def"><note pitch="C4"/></modify>

<delete at="iuid.DA758.23def"><note/></delete>

</mnx-transforms>

Notes

* The uid is of the form `iuid.<a>.<b>` where a is a unique identifier of the element and b is the identifier of the part/measure containing the element. This allows specific error reporting including the part and measure number where the transform fails (unless these have been deleted from the score).

* The type of the element could be included in the IUID eg "iuid.note.2345.abcd" for improved readability

* The score itself should have a uid to ensure that the transforms specify unambiguously which score they refer to

* A IUID must be permitted *in addition to* the existing XML ID on each element as explained above, and the standard must somehow allow it to be much more narrowly defined than the normal XML ID. Since it seems probable that others will come forward with other incompatible requirements for IDs perhaps it is possible to allow *unlimited* XML IDs on any element, each with a special prefix (eg "iuid.") and a separate registry of prefixes and the defined usage.

Predetermined Generation Rule for IUIDs (PGR)

A couple of possibilities exist for generating IUIDs

An incrementing numeric value.

Notes

* The next value needs to be stored in the MNX file. We cannot rely on automatic detection of this because an element with the latest value may be deleted, and this could result in reuse of that value.

* This cannot define a IUID for the whole score

* This cannot be used for uids attached to new elements created by transforms as these can be independently created from the original read-only score and so there is no way to disambiguate them.

A (ms or us) timestamp.

Notes

* We assume that 2 identical timestamps cannot be created (but this possibility could be detected and avoided).

* This can be applied to the whole score and to uids attached to new elements created by transforms.

* A 32-bit ms timestamp wraps every 49 days. The collision probability seems acceptably low

Other examples of possible uses of transforms.

There are many other situations in which transforms as described above would be advantageous. Some examples are considered here. But I'm sure others will spring to mind

Translation of lyrics

A set of of lyrics for a score in any language can be stored as a set of transforms, so we can store the lyrics separately from the score, and still be able to edit the original score without lyrics

Extraction of parts

Parts could be extracted by application of transforms, including separation of voices on a stave into separate parts, while retaining score editability.

Transposition of parts

A set of transforms to transpose a part can be applied without modifying the original score.

Summary

I have described a mechanism called transforms for capturing changes to an MNX score without changing the score itself, with particular application to handling annotations with multiple authors in a group of music performers such as an orchestra or choir.

I have shown that these transforms require a type of Unique Identifier, known as IUID for elements of the score which is incompatible with the existing XML ID, so is an additional requirement for MNX.

Responses to questions

Implementation burden

@joeberkovitz IUIDs should be optional, and can be ignored by read-only or write-only applications which don't use them, and can be stripped by read-write applications which don't support them. So they are not necessarily a burden.

Producers and Consumers of MNX

@joeberkovitz In the example of the automatic creation of a choral arrangement of a piano score. The application program could generate a new MNX score with the choral arrangement, or it would be possible for it to generate a transforms file to be applied to the source score. In neither case would it necessarily be required to write IUIDs unless the target needs to support transforms

jsutdolph commented 6 years ago

NB The markdown implementation here does not handle wrapping in lists. Here is a pdf rendering: Transforms.pdf

joeberkovitz commented 6 years ago

Thanks for the more detailed scenarios, it's a great help to exploring this issue further.

First, there's some substantial W3C history on handling annotations independent of an original document. Please take a look at the work of the W3C Web Annotations Group. That was an attempt at a rather general solution which is no longer active, but I am sure there are some lessons worth mining (including understanding why this initiative has not progressed). That work did not find IUIDs necessary, for what that's worth.

A few preliminary questions come up after a quick read of your post:

bhamblok commented 6 years ago

I like all suggestions apart from the one prefixing all UIDs with a similar "iuid"-string which I think is an unnecessary overhead. We should keep UIDs as short as possible (if not only for saving storage and bandwidth).

jsutdolph commented 6 years ago

@joeberkovitz "employ collision detection on any newly generated ID, and then make a new one that's different." Perhaps you would like to amplify how to "make a new one that's different" in the general case? This is the stumbling block. You could find the string of the longest length and then make a longer string. Any other suggestions? Have I missed an obvious trick?

jsutdolph commented 6 years ago

@bhamblok Fair enough but we need a reserved prefix. I think you will find that when the document is compressed for storage or transmission over a network the overhead of a fixed prefix will be negligible, but I agree in principle that we don't want dead wood

jsutdolph commented 6 years ago

@joeberkovitz Thank you for the link to the Web Annotations. I was interested to notice the reference to robust anchoring being a problem. viz. https://www.w3.org/2014/04/annotation/submissions/ColeHabing-AnnotationWkshp-PositionStatement-Final.pdf search for "anchor" Also note that in this case the source document for annotation is immutable.

joeberkovitz commented 6 years ago

@jsutdolph Any producer-consumer implementation will have its own scheme for generating new element IDs, that incorporates some uniquifying mechanism -- a sequence number, a PRNG, a crypto hash function, etc. This scheme must function to generate new IDs, even if the document is never modified by any other app: the producer has be able to to consume its own documents, and make further changes.

Let's call our imaginary implementation AliceNotes, and say it's modifying a document generated by another application, BobNotes. The fact that BobNotes uses a different ID generation scheme makes no difference. AliceNotes just uses its regular ID generator, and when a collision is detected, AliceNotes bumps up the sequence number, or invokes the PRNG to generate a new string, or whatever it usually does, until no collision is found. Unless the document has an infinite number of elements, this should work fine :-)

I do think you have a good point about potentially requiring stability in IDs when documents are modified; I just think that requiring IUIDs and transforms is huge overkill.

joeberkovitz commented 6 years ago

@mdgood and I were talking today, and he pointed out a more fundamental problem with this issue. I expect he'll chime in, but I wanted to get it on the radar.

The requirement for applications to maintain stable MNX element IDs alone (never mind creating new ones) is a hardship in itself, because

  1. applications cannot be expected to employ only data structures that have a one-to-one mapping with MNX elements. Sure, there is probably a data structure in most notation apps corresponding to <measure>, but there are lots of other elements that are not likely to have exact analogues in the application's data model (e.g. <accidental>).

  2. existing applications may not even have a slot in which to maintain naming information of this type.

I would point out that if MNX winds up not supporting the "ensemble dream scenario" above (and it's a good scenario), that doesn't mean that the scenario can't happen. It just means that the full annotation scenario has to take place inside the context of an application that was designed to support this kind of thing, rather than across multiple applications from different vendors that are cooperating via MNX.

It would also still be valuable for MNX to be able to encode the final results of the scenario, i.e. to export a file that contained different members' final annotations and was able to distinguish them as such. That seems doable even without stable IDs -- for instance, MEI has a alternate-reading scheme for different glosses on a document.

jsutdolph commented 6 years ago

@joeberkovitz Thank you. Excellent points. Very helpful.

I am persuaded that we can use the existing ID field for the IUID.

A sufficiently long (64-bit?) timestamp can be assumed to ensure uniqueness whatever the other IDs are. An application would need to check uniqueness of the IDs on loading, which is not necessarily checked by all validating parsers.

Some comments. The application that edits MNX via a non-mapping internal format will find it difficult to participate in the 'dream scenario'. I'm not sure what the 'slot' refers to.

Only transform-able (ie conforming) applications will guarantee the invariance of IDs of elements when editing MNX files.

Certainly the 'final result' could be encoded in MNX as it is a score. However if you store that for each user you are back to what we are trying to get away from. Much better for all users to have a single shared MNX score and shared and personalised transforms, and a means of rendering the result directly including the metadata.

For an MNX document to 'able to distinguish [annotations] as such' would require an extension to the MNX spec?

@joeberkovitz "Aren't transforms a very general concept that can be applied to any XML document?" That seems a bit too general to get my head around. I cannot see how music annotations and edits can be served by a generic textual annotation framework since they are governed by musical rules and (mostly) not textual rules. As you point out the effort to standardise annotations purely for text documents foundered. I think we can succeed for our requirements.

@joeberkovitz "the full annotation scenario has to take place inside the context of an application.." A surprising anti-standardisation statement! What is this forum about? They could have said that about PDF, HTML,TCP/IP,MusicXML... I don't see the problem with defining a framework for inter-operation for those applications that want to collaborate.

Are there any other points I've missed?

joeberkovitz commented 6 years ago

@jsutdolph There is no inherent computer-science problem with defining a transform framework, which has already been done for many kinds of documents. Many multi-user editors attest to this. It is just that it is going to be a quantum leap in terms of the complexity of MNX, if the standards group must work to define it, and then all implementors must work to implement it.

This scenario is definitely worth attention. It is a question of priority: forums such as this cannot take on all problems at once, regardless of merit. So I am in fact very proudly anti-standardization in the short term, when the scope of work begins to transcend what is practical for a real-world collection of humans and block out other more urgent questions. It is very common for standards groups to decide not to decide something, in the interests of progress. This does not mean anyone dislikes the goal.

I would like to keep a long-term discussion going regarding scenarios that maintain a multi-party edit stream with transforms and timestamps -- that is really what this issue seems to be about, not IUIDs per se (although they may be an ingredient in doing so, if we ever do it). But my honest feeling at the moment is that if this belongs in the spec, it is a V.Next feature, not a V1 feature. Transforms and ID schemes can be grafted on, and I doubt they can be mandated for all conforming MNX applications (think of the poor command-line transposition tool).

I encourage others to chime in on their sense of priority and feasibility here.

notator commented 6 years ago

I rather agree with @joeberkovitz that

  1. that this is something that should be revisited after V1 of the MNX standard.
  2. this issue is not really about IUIDs per se, its about "multi-party edit stream with transforms and timestamps".

That being the case, maybe it would be a good idea to change its name to something like "Annotation Tracking" or "Multi-Party Score Versioning".

Apropos annotations: We seem to have agreed that MNX is going to be based on SVG, so there's a simple way to add annotations to an existing score. One just defines (one or more) <g> element(s) at the end of the file, probably with a class="annotation". Like this:

<svg>
    <!-- existing score code -->
    <g class="annotation">
        <!-- any annotations (graphics) -->
    </g>
</svg>

Such an annotation could, of course, have an id of some kind (maybe the author's name plus date stamp). Further annotation layers could be added at will, whereby the annotations lower down the file appear in front of the earlier annotation layers (z-order).

This would be where I'd start thinking about MNX annotations. However, I think there are going to be issues with this approach, that should be thoroughly discussed in a separate thread (bearing in mind that annotation tracking should, if possible, be enabled). In contrast to the full "multi-party edit stream with transforms and timestamps" scenario, this approach is not a score-versioning system. I currently think of that as a separate problem, probably best left to score editing software.

joeberkovitz commented 6 years ago

@notator The CG did not agree that MNX was going to be based on SVG in general. In particular, CWMNX/MNX-Common has no SVG component -- note the closure of issue #25. At this time, only GMNX/MNX-Generic is based on SVG.

mdgood commented 6 years ago

@notator, we really do need you to stop proposing SVG solutions for issues that involve CWMNX. They are off topic and derail productive discussion.

notator commented 6 years ago

@joeberkovitz and @mdgood Understood. However, I'm very curious to see how you propose to instantiate any scores, including CWMN scores, on the web without using SVG. Sorry if I'm thinking too far outside the current box.

mdgood commented 6 years ago

Then I suggest you look at Noteflight, Soundslice, SmartMusic, and Flat which are just four of the current applications that instantiate MusicXML scores on the web. MNX is an effort to improve what already works.

notator commented 6 years ago

I'm afraid that "what already works" only works within unacceptable limits for a W3C standard. I'm not alone in thinking that we can't just patch MusicXML.

joeberkovitz commented 6 years ago

@notator This discussion is off topic for #65. It is on topic for #25, which was closed 9 days ago by the chairs after a thorough, open discussion by the community in which many others contributed their views. Please read the notes for closing #25; if you have further comments you are welcome to append them to the closed issue.

jsutdolph commented 6 years ago

Ignoring the temporary off-track... @joeberkovitz I agree about the specification of transforms being something for the future. I just wanted to make sure that there was nothing we have missed which would make MNX inimical to transforms as was MusicXML before 3.1. I think with the optional presence of IDs on all elements we have what we need.

jsutdolph commented 6 years ago

The anti-collision algorithm (AliceNotes and BobNotes) above is potentially expensive:

"AliceNotes just uses its regular ID generator, and when a collision is detected, AliceNotes bumps up the sequence number, or invokes the PRNG to generate a new string, or whatever it usually does, until no collision is found"

If using a known sequence to generate ids (incrementing counter or PRNG) it might need to do many (many) cycles of collision test and failure, especially if the file has been written by the same app or an app using the same strategy. It would seem sensible to reserve a prefix for the ID so we can recognise if it as one of our own, then we know how to generate non-colliding ids. However the usual backwards url scheme (com.abc.def) would create ridiculous amounts of repetitive dross in the file.

My current thinking is that using "iid" followed by the 64-bit microsecond system time is so good a probabilistic guarantee against clash that it won't need any collision detection at all.

samuelbradshaw commented 6 years ago

I'm new to the project and still trying to get up to speed (I'd never heard of MNX until a few days ago), but I like the idea of being able to switch different lyrics in and out for the same score. Switching languages, mixing and matching texts and tunes, showing only one or two verses at a time, or even hiding the lyrics entirely, are a few use cases. I don't know if a hard-coded transform is the best way to handle switching out lyrics, though. Has a lot of thought already been put into how MNX will handle lyrics, beyond how MusicXML handles them?

joeberkovitz commented 6 years ago

@samuelbradshaw I would suggest you open a new issue (this one is unrelated) for the question of encoding lyrics in multiple languages. For more open-ended mixing and matching of lyrics and melody, my opinion is that this goes beyond the remit of MNX, but feel free to open other new issues that describe these cases with more specifics.

bhamblok commented 4 years ago

I would like to reopen (tag this issue to "active review") soon. I still think the current spec is to concise on the definition of an "XML ID".

I suggest to add (in a better wording, excuse my English)

  1. that an "XML ID" should be an immutable identifier, which should be unique in the document.
  2. that the id-attribute is an optional attribute, which only needs to be defined when another element is referencing to this element (eg: via a target-attribute)

We are starting to use "target"-attributes at some elements (like <beam> or <slur>), and there has been discussion about what an ID is supposed to be. There are plenty of use-cases mentioned above why it should be immutable and not represent a sequential number or path in any particular order like "event1", "event2", "note1", "note2" ... Those kind of id's are perfectly eligible, but to a user (or application) reading the plain xml, they also have a semantical meaning. This semantical meaning could, over the lifetime of a document, not reflect the correct position of the actual element anymore.

notator commented 4 years ago

As I explain in https://github.com/w3c/mnx/issues/193#issuecomment-691027038, I think that MNX should not use IDs at all. Its unnecessary to create "immutable identifiers" when the MNX code consists of strictly ordered lists, and such identifiers would mean a lot of extra work for applications.

I agree that §3.2.3 Element locations needs to be revised. In particular, if the term "XML ID" continues to be used, it should be properly defined. (I don't think it needs to be a unique, immutable identifier.)

bhamblok commented 4 years ago

@notator Let's continue the discussion from #193 in this thread...

I'm still not convinced. there is a thing called XPath, which does exactly what you want to do/use. See official spec on W3C or documentation on MDN. Of course this will take us to far, to start implementing things like this into this spec as well...

However I could agree on using XPath-alike values for target-attributes (because you made some good arguments in your last comment https://github.com/w3c/mnx/issues/193#issuecomment-691027038) I definitely feel the need for unique ID's, especially when we come to the point where we are going to elaborate on "presentation"/"styling" and "performance". I still hope we can separating style from semantics (and maybe performance) some day ... #161

Maybe we should postpone this issue indeed for a V.Next Feature as mentioned above...

What do others think?

notator commented 3 years ago

I've never knowingly used XPath, and don't really know anything about it. The MDN documentation says

XPath stands for XML Path Language. It uses a non-XML syntax to provide a flexible way of addressing (pointing to) different parts of an XML document. It can also be used to test addressed nodes within a document to determine whether they match a pattern or not.

That's not exactly what I want to do. Judging by the above quote, XPath is not an XML document component. Its a way of designing query expressions that can be used by applications that are working on XML documents. So XPath does not belong inside the MNX Spec, but MNX target attributes could well end up in XPath path expressions.

I definitely feel the need for unique ID's, especially when we come to the point where we are going to elaborate on "presentation"/"styling" and "performance".

I may be wrong, but I don't think IDs are ever going to be necessary in the MNX spec, simply because elements can be identified using their position in ordered lists. That's going to be true, even if/when the elements contain presentation and/or, style and/or performance information.

Its a bit off topic here, but you said:

I still hope we can separating style from semantics (and maybe performance) some day ... #161 Maybe we should postpone this issue indeed for a V.Next Feature as mentioned above...

I think the separation of concerns that you wanted to define in #161, is happening automatically. Its not something that needs to be postponed. In MNX by Example, @adrianholovaty is naturally tackling the simplest examples first. These are all about how to code the relations between the graphical objects in CWMN. This means that the examples can all be translated to the web's graphical standards (SVG or Canvas). (Its very important for MNX to have a connection to the web.) Once we have a solid foundation for the graphics, we can move on to deciding which other information (e.g. presentation, style, performance) we want to include and how. Its interesting to see, in the Co-chair Meeeting Minutes: August 18, 2020, that the co-chair has not forgotten about interpretation content. That debate is going to be very interesting when it comes. :-)

bhamblok commented 3 years ago

@notator you are right to say that we should not include XPath in this spec, but we should not reinvent the wheel to define a target specifier. I'm sorry these threads are getting mixed up, but the new examples you provided in https://github.com/w3c/mnx/issues/193#issuecomment-691708030 are in my point of view rather complex instead of "simple", there are way to many "if"-statements. When we would be using id's, no if-statement at all would be necessary :-)

I don't think things are being made too simple, because we are only concerned here with the content of an MNX file (where the order of elements in their lists is always fixed). We are not concerned with applications' internal data structures, only that the pointers can be connected up correctly when reading the file, and saved correctly when writing it.

My applications' internal data structure === MNX. So there is a state between reading the file, and saving it. I would like to be able to represent this "in between state" in MNX as well... In other words, MNX should always have the correct state, at any moment in time during the runtime of my application.

When my application dynamically changes some elements (that can even be during a "live performance, on stage", ... even in a loop), I would like that other elements, which are targeting some elements which are being affected by the "change", still are pointing to that original element. That's why static id's are so useful.

A path to a "fixed list of ordered elements" is useless because elements in MNX should not be considered "fixed".

clnoel commented 3 years ago

@notator You are missing the point that, in an editable document, the events are not a "fixed list of ordered elements". As soon as someone enters an extra event, or changes the order of the voices/sequences in a measure, or adds a part, or reorders the parts, the targeting element (slur, etc.) will have its end-point at the wrong location, because what was "p1:m1:v1:e3" is suddenly some other event than was originally intended. In order to keep the targeted points correct, the editing program (or person, if it was just a person reordering parts in the XML via copy-paste) would have to recalculate all the target ids, potentially for the whole piece.

I'm pretty sure that giving targeted elements a unique id is a less complex solution. Then, the only time you need to worry about recalculating targets is if you delete an element with a unique id on it. It does mean that an editing program would need to be able to create a unique id but there are many, many ways it which that can happen with known algorithms.

--Christina Noel

notator commented 3 years ago

@clnoel Welcome back! I think its both you and @bhamblok that are missing the point.

MNX is a file format, not an application's internal data structure. The whole point of files is that they store a single state of the information in a way that can be easily stored and retrieved by different applications. Editing applications have to be designed to cope (at run time) with the kind of flexibility you describe, so they use volatile constructs like linked lists, pointers, element IDs etc. These are the data structures that allow copy/paste etc at run-time. But many different applications will be reading MNX files, so we can't assume anything about what they do with the data internally when they have read it. For example: The Draft Spec currently has (both <global> and <part>) <measure> elements defined in a simple, chronologically ordered list (without IDs). I would expect an editing application to load <measure> objects into a doubly linked list (or similar structure) so that they can be deleted and inserted easily. But applications like my MNXtoSVG don't need to do that, since they are not going to change the sequence of measures in any way. Applications that use volatile <measure> lists at run-time can still write the lists out as simple sequences in the MNX file, and read them back into doubly linked lists (or whatever) when they need to.

I think the use of ordered lists and properly scoped identifiers in the MNX format would be much simpler than forcing all applications to have to cope with globally scoped (immutable and augmentable) UIDs.

samuelbradshaw commented 3 years ago

Because most of my background is in web development, I have worked a lot with element IDs, and I prefer that system for referring to an element, compared to relying on a sequence of elements that can change at any time. I am in favor of globally unique IDs as an optional attribute that can be used or ignored by the application.

The biggest win that I see for IDs is allowing elements to be referenced and targeted externally – by the application, or in stylesheets or documentation outside the MNX file – remaining constant even when shared with other users or potentially other applications. I don't have as strong a preference about what kind of reference is used internally – from one element to another for slurs, etc.

mmzmusicnotes commented 3 years ago

Colleague of @clnoel's here, and also a former web developer. I have been lurking and following this discussion for a while.

Here is where I am confused, @notator: Why does the presence of a universal ID obligate your application to use it, if you are just reading the data? As you say, you are not changing the data. If you're not changing the data, you don't need to worry about the UIDs' value - when you read it out, just apply the same UID that existed before.

The concept of providing an (optional) universal ID for an element exists in a wide variety of file formats, particularly those which are XML-derived. (X)HTML/CSS and SVG both do. That doesn't mean you have to care about it for your use case, if you don't. It doesn't even necessarily mean you have to set it, though generally that would be a good practice. (And as was noted previously, there's lots of pretty straightforward/relatively low-cost algorithms for generating IDs.)

notator commented 3 years ago

@mmzmusicnotes Even though I'm not changing the data, I still need to know where the end-points of slurs are so that I can draw them correctly. Currently that means I have to

  1. read the file in, noting the IDs of slur end-points and the IDs of elements that have them.
  2. do a second pass, connecting each slur end-point to the element having the proper ID. The most efficient way to do that is to search outwards from the slur's local position in the data structure (i.e. first search the voice then the part, then the other parts). But it would be much easier/quicker if the slur's target attribute simply told me where the end point was "this voice, measure 5, event 3", rather than making me do a search for it.

I have the feeling that we are talking rather at cross-purposes here. I suspect that the element ID currently being used to connect slurs is not the same thing as the element's UID. And that UIDs belong in HTML, SVG etc instantiations rather than in MNX. Does that help?

bhamblok commented 3 years ago

@notator, I hardly believe that you would have to do a second pass to look up the target-elements :-). The way you describe your workflow to have to search and find a target-element sounds like a really complex, costly and error prone operation. Hasn't your development environment a function or api like "getElementById"? This is exactly why people have been standardising these processes, to come up with something like XPath (which is being used in a widespread of developer- and testing-environments). For example, HTML has this "build in", sort to speak..., it's not XPath and it's not in HTML as the language itself, but all browsers (being consumers of HTML) have a whole bunch of DOM-api's which you could compare to XPath. The CSS-selector-syntax is also very similar. Or nodeJS DOMparser/XMLparser libraries all have these kind of functions. We should not try to "invent" new "selector"/"identifier"-syntax to try to link one element to another.

To answer your previous comment: the element ID currently being used to connect slurs is exactly the same thing as the element's UID. An XML-ID (as in the wording of our spec) is the same thing as a UID. Any ID-attribute in any XML always is, or should be a UID. In that sense that a UID is a unique ID in the document. This is not only for HTML and SVG, but for any XML-based file-format.

notator commented 3 years ago

@bhamblok Of course I use standard XML scanning techniques to parse MNX, but that does not mean that MNX is either an HTML or SVG file. I don't use XPath because I don't just need to query the contents of the MNX file, I need to parse it completely in order to make an instantiation. There are lots of ways to make instantiations of MNX files. My application just makes one.

Ordinary IDs only have to be unique inside a particular document. UIDs also have to be unique outside the document (across many different documents). So there might be some point in adding UIDs to the SVG I produce, but I don't see a use-case for them inside (abstract) MNX.

An XML-ID (as in the wording of our spec) is the same thing as a UID.

Can you say where that is in the Draft Spec? I can't find either XML-ID or UID by searching for them.

bhamblok commented 3 years ago

@notator ok, here is the misunderstanding... In the developer-environment where I feel "happy as a fish in water", we speak of a UUID. (A Universal Unique Identifier)... Where a UID is just a unique identifier in that particular document.

So for me, "event1" could be a UID, as long as it is not being used twice in the document. However(!): "event1" is the worst example ever, because it implies to be the first event in a sequence of a fixed list of ordered (event-)elements... which, again in my opinion, MNX is not (a fixed list of ordered elements).

You also mention:

MNX is a file format, not an application's internal data structure...

I disagree... Can you say where that is in the Draft Spec? :-) In my opinion, MNX (as is XML) is a standardised data structure (which can be saved later on in a file of course). Otherwise we talk about an xml-file (or mnx-file), not just XML or MNX.

To answer your question in your previous comment: You can find the reference to an XML ID (sorry, without a dash) in our Draft Spec at §3.2.3 Element locations

notator commented 3 years ago

@bhamblok Great to be clearing up some misunderstandings! :-) §3.2.3 Element locations says

An element location constitutes a reference to a specific element in the document. It consists of the character #, immediately followed by the XML ID of the referenced element.

That just says that MNX uses the standard way to reference local IDs from within the same document. It doesn't say anything about UIDs.

According to Wikipedia, the term UID can refer to either a GUID or a UUID. It is never used for an ID whose scope is local to a particular document.

bhamblok commented 3 years ago

@notator and the xml spec from the W3C states:

[XML 1.0] and [XML 1.1] provide a mechanism for annotating elements with unique identifiers. This mechanism consists of declaring the type of an attribute as "ID", after which the parser will validate that

  • the ID value matches the allowed lexical form,
  • the value is unique within the XML document, and that
  • each element has at most one single unique identifier

So, in my understanding that's "unique" :-) and thus like I interpreted the acronym "UID" as just a document-wide unique id.

Or is xml:id something different than an id-attribute? This at least should be clarified in our spec...

Or can we come to a conclusion that target-attributes on <beam> or <slur> should point to an id which is unique in the document (like in my understanding "XML ID" is supposed to be used)?

samuelbradshaw commented 3 years ago

I think that IDs in the document, if used, should be UUIDs, or maybe UUIDs prefixed by a string. That provides a common way for various applications to generate them, and guarantees that they will always be unique, even in cases where two people are editing the document at the same time and need to merge changes, and it reduces the temptation to change IDs when moving things around in the document. But I don't think that the references used internally to refer from one note to another necessarily need to be UUIDs or even unique IDs – they could just be numbers than indicate a length, for example. I wouldn't be opposed to UUIDs for internal references myself, but maybe there's an alternative solution we haven't thought of...

If I'm understanding what @notator is hoping for, it's to be able to use math to calculate which note to attach the slur to – for example, if I know I'm at note 6, and it says the slur ends at note 8 (or continues for 2 notes), I can just add 2 to my current location and I'm done. Is that correct?

notator commented 3 years ago

@bhamblok Neither your quote from the W3C xml:id spec nor our Draft Spec mentions UIDs. The W3C quote just says that the value of an ID attribute must be "unique within the document" and that elements can't have more than one ID. Currently, §6.7.2 of our Draft Spec says that <slur>s have a target attribute that is an (ordinary) ID. As we all know by now, I think that should be changed (as per the examples in https://github.com/w3c/mnx/issues/193#issuecomment-691708030).

@samuelbradshaw said:

I think that IDs in the document, if used, should be UUIDs, or maybe UUIDs prefixed by a string.

Okay. (My emphasis. Also, bear in mind that each element can only have one ID.)

That provides a common way for various applications to generate them, and guarantees that they will always be unique, even in cases where two people are editing the document at the same time and need to merge changes,

I think we should be very clear about which documents we are talking about. Are we talking about MNX documents, their instantiations (SVG etc.), or both? The case for having such UUIDs in the instantiations is clear, but I'm not so sure about having them in MNX. If slur.target values in MNX documents are all UUIDs if some kind, that would make it very difficult for humans to read the XML. Imagine trying to check/change/debug where slurs end, when the IDs all contain pseudo-random numbers...

and it reduces the temptation to change IDs when moving things around in the document.

If its a UUID, then simply changing a slur's target would be a rather expensive operation. Giving every <event> and <note> a UUID would make the file huge and illegible...

But I don't think that the references used internally to refer from one note to another necessarily need to be UUIDs or even unique IDs

Agreed completely!

– they could just be numbers than indicate a length, for example.

Slurs don't always stay in the same voice, and grace notes prevent the use of metrical durations. So my proposal is a little more complicated than that. (See https://github.com/w3c/mnx/issues/193#issuecomment-691708030)

I wouldn't be opposed to UUIDs for internal references myself, but maybe there's an alternative solution we haven't thought of...

As I said above, using UUIDs for slur.target values would be problematic.

If I'm understanding what @notator is hoping for, it's to be able to use math to calculate which note to attach the slur to – for example, if I know I'm at note 6, and it says the slur ends at note 8 (or continues for 2 notes), I can just add 2 to my current location and I'm done. Is that correct?

Nearly. I don't even want to do any math. I just want to "dereference a pointer". If, after parsing the MNX file, I've got a list of 8 events in this measure, and the slur starts at event 6 and ends at event 8, then the slur's target event is events[slurTargetEventNumber -1].

My current feeling is that UUIDs don't belong in MNX files. I think of MNX files (like MusicXML files) as being imported into and exported from single-user applications. (Maybe someone can think of a counterexample.) And I think that IDs are an inefficient way to link elements inside XML documents.

clnoel commented 3 years ago

I think of MNX files (like MusicXML files) as being imported into and exported from single-user applications. (Maybe someone can think of a counterexample.) And I think that IDs are an inefficient way to link elements inside XML documents.

First, IDs are at standard way to link elements inside XML documents, inefficient or not. And second, while Musicnotes' most probably use-case for MNX documents is also that of an import/export format (link MusicXML), it is very obvious from this conversation that other people have use cases that make MNX their native file format.

Second, there is no point in having an id in an XML document that is not unique in the document, so I think we are having a terminology issue here, among the other issues we have. So, for the purposes of my discussion here:

"id" - human identifiable and parseable label that identifies an element. LIke "Violin Part", or "CMajorChord". (Includes UIDs and UUIDs, but does not need to be unique... it's more like a comment than a true identifier)

"unique id/UID" - a label that is unique inside a given scope, in this case inside the document. This can be accomplished with simply keeping a list of ids to avoid duplication: "Violin Part 1", "CMajorChord 3"; by systematic labelling: "P1:M1:V2:E4:N2"; or by any method that generates a global unique id (see below).

"universal unique id" or "global unique id" "UUID/GUID" - a label that is randomly generated and so complex that it is almost certain to not be used elsewhere not only in the context of the current document but also in a much wider scope, such as if the document were embedded in something else.

@notator, There is no need to label/id every single event and note in the document, and certainly no need to do so. You only need to label those elements that are targets. Having a slur have a UID target, and therefore having to label a select set of events with IDs in order to be them to be the targets of slurs, is not unreasonable, and is, in fact, the way XML, HTML, and SVG documents usually work in practice. You only label something when you are going to need the label somewhere else.

I think I may have made some of these points before, but:

If you are not editing in MNX, you shouldn't need to care about the targeting method at all. When you save, you know both ends of the targeting element, and generate ids, offsets, or whaterver else we end up deciding is necessary. When you load, you have to store up things like slurs until you know both ends no matter what way we are specifying the target. There are just two cases. The target element is after the slur's start element, so we know what we're looking for (event location or target id don't matter, as long as we can look), or the target element is encountered before the start element, in which case we have to remember it (in an ID list) or go looking for it (calculated off of event location, or otherwise scanning for target id).

If you are using MNX as a native file format, you should be able to do simple changes without worrying about having to reconnect notes. And we're fooling ourselves if we think people aren't going to want to be able to go in and manually tweak these files. That's part of the point of having a human-understandable format to begin with. If we didn't want to accept that possibility, we could just define the whole thing as a binary data-blob with a defined format (like the MIDI specification). So given that probable use-case, a slur connecting two notes should not need to be changed because the half-note in between the ends was changed into being two quarter-notes (which changes the number of events in the measure). If you need to reorder the parts, or decide to re-order the sequences in a measure from bottom-to-top instead of top-to-bottom, you should not have to worry about changing the definition of the slur targets.

notator commented 3 years ago

We are indeed having a terminology issue here. @jsutdolph , who opened this thread, is clearly talking about Invariant Unique Identifiers (IUIDs), that are a form of UUID. See the first few postings to this issue, and especially his proposal in https://github.com/w3c/mnx/issues/65#issuecomment-368899649. So the discussion about using an ordinary, local ID as the target of a slur is actually off-topic here. Except that if element id attributes are going to be reserved for IUIDs, then they wouldn't be available for local slur targets (readability/efficiency issues, see above). (If necessary, that problem could be solved in MNX by storing @jsutdolph's IUID values in iuid attributes.)

@clnoel said:

You only need to label those elements that are targets.

The point I'm trying to make is that you don't need to label targets because they have a unique position in the file. The slur.target's value just has to tell you what position that is. That simplifies things because if target IDs don't exist, they don't have to be stored when parsing the file. Unlike the usual situation, in which the target can be anywhere in the file, we have targets that are located in a predictable data structure. So we have the opportunity to simplify our approach. IDs may be a standard way to link elements inside XML files, but I don't think they should be used if there is an easier, more efficient way to achieve the same thing.

bhamblok commented 3 years ago

@notator I agree with you that it would simplify things, but the point I'm trying to make is that MNX as a data structure is not "predictable". In some use cases it is dynamic and thus "unpredictable". It is not a fixed list of ordered elements.

So if you want to push your strategy, some very interesting use cases will be impossible to implement.

And like @clnoel said:

IDs are at standard way to link elements inside XML documents, inefficient or not.