Open Daniel63656 opened 6 months ago
Interesting. You may be right, especially when assessing OMR results.
I will bring this up on the OMR Research Slack channel to see if there is consensus around this (omr-research.slack.com).
From what I read in your paper you already use two different trees (beam and tuplets) to check various aspects of notation. These "voice-wise" trees will remain necessary to check beams, tuplets and voice consistency (are all notes belonging to the same voice). I would suggest adding a third tree that is event based and use this for the element comparison rather than the beam tree. This also has the advantage that tied notes could be tied across voices which is possible in notation but impossible in your approach right now.
Currently I do something similar on my custom encoding tree that I use for OMR (that is event based) but I would rather have this abstracted and in an external repository like yours to make this a more acceptable standard. I would happily contribute to your repository if this ends up a way you want to develop your tool.
@fosfrancesco do you have any thoughts about this?
It's not just the beam and tuplets analysis that needs to happen at the voice level, it's also the (added later by me, not mentioned in the paper) gap-between-notes analysis.
(Edit: having second thoughts about this; gap analysis after combining voices might be fine.)
There are difficulties here: musicdiff is similar to a text file diff, in that there is a very strong assumption that notes (like characters on a line) follow directly after each other (thus the need for gap annotation to determine when there is a space between notes). If we combine all the notes in a measure into a single event list for comparison, many of them will overlap each other, breaking this assumption. Perhaps, with a perfectly predictable event sort and a new sort of gap analysis that handles overlap ("negative gap"?), this difficulty can be overcome. @fosfrancesco what do you think?
I see. You can still do the same on an event to event basis by knowing which voice id the individual chords and rests have and checking their respective voice offset. This is done in the hundrum Kern format for example to check that different voices end up on the same time at a common event. In fact, I handle this exact problem in my event based parse tree.
I had a version of musicdiff that added into the diff a comparison of note offsets, but that's just wrong for a diff (and the results were fairly confusing). Imagine a text file diff that did a normal text comparison, but also compared the file offsets of characters.
I found that including a comparison of gap size after previous note (which was almost always zero) worked best.
I don't understand why this would be a problem to be honest. This concept doesn't really make sense for text and would be trivial (there is only one "voice" and each character is one character "long") but is intrinsic to the musical domain. Also keep in mind that you do this offset comparison anyway, just voice by voice. You would just switch to another ordering but do the same thing essentially. Like I said, I do this exact thing to parse my event based tree into musicxml. Gaps arise in this way if voices that contain chords/rests in the same event and should therefore end on the same time (prior to adding these chords/rest to the voice), do not. In this case I assume the event to fall on the latest occuring offset which creates gaps in some of the "shorter" voices. In theory, this could also be resolved by changing a voice's elements duration though. I assume one should check the cost of all existing resolving strategies and take the minimal one.
I'm working on a prototype implementation to see how it might work. If it seems to work I'll point you at it to give it a go and see what you think. Thanks for all the discussion, it is very helpful.
Ok. Feel free to ask me. We can also have a short meeting if you want to. I also need the described function on my parse tree, maybe we can avoid developing/conceptualizing everything twice
Hello! I think a switch to a completely left-to-right comparison (without independent voice handling) would be not trivial, for the reason Greg wrote above: notes with partial overlapping would force another paradigm for comparison. The idea until now was to consider homophonic streams, that are completely defined by sequences of consecutive notes (or chords) and rests (or shifts) with their types which define their durations.
Switching to a note+shift representation would require some reasoning. For example, we would not want that removing a voice would cause differences in the shifts in the other voices.
However, I do agree that a left-to-right comparison would make sense for certain applications. I would be available for discussion, but only after the 20 of May
I agree that it would be non trivial. But I think the problem I described doesn't apply here. In my case the model makes predictions in terms of events. Only then I need to check the "offset consistency" between voices and add gaps where necessary. In your case you work with musicxml or something comparable, so the chords/rests are given voice by voice. Because voices are "continuous" in this case by definition, time can be integrated over the durations within voices. Using the so uncovered onset times, an Event-Major tree can be constructed by grouping chords/rests/shifts by common onset. Because this event tree is just a secondary "view" on the voice wise music, voice offsets agree from event to event by definition. If this procedure then produces events with different chords/rests/shifts in the other sequence, this must have been caused by wrong durations or missing chords/rests/shifts i.e. changes, deletions or insertions.
I switched this up with my case, my bad. I think this means voices do not influence each other then.
I would also suggest differentiating these "duration related" errors with "pitch related" errors. Because these are independent attribute dimensions, it would be good to know how much of the edit distance is caused by these 2 categories respectively (with the rest being others e.g. wrong/missing beaming, articulation, etc.). An approach might be good at predicting the correct pitch but totally mess up duration and simultaneity or vice versa.
Unfortunately, I'm unsure if I understand your proposal with the event tree. How do you define the time between a "common onset" and the next one?
Also, you should be aware that this version of the code doesn't use the tree edit distance described in the paper anymore, as it was very slow for practical purposes. Instead we "flatten" the tuplet and beaming tree in a list of attributes for every notes (along with many more attributes Greg added) and we compare them with sequence edit distance.
Regarding the differentiation between pitch and duration errors, this should be easy to implement, by separating the list of differences which is returned to be displayed in multiple categories. Instead, if you want this to influence the diff computation (for example by weighting more the pitch errors) you need to modify the costs that are added in this function https://github.com/gregchapman-dev/musicdiff/blob/bf30f6212c979e1fe1f9c1b8f58e321dc7d1af41/musicdiff/comparison.py#L846C9-L846C29
Sorry for not getting back to you sooner. I think it would be best to describe this in a meeting rather than chat. Are you all available this/next week?
You should try out the branch gregc/ignoreVoicesAndChords. Francesco came up with a nice algorithm for comparing all the notes in a measure, without reference to voices or chords that might contain those notes, and I implemented it.
Not exactly what you were requesting, I think, but it might be helpful.
Sounds interesting I will take a look. But can we still meet up? I assume it won't be easy to understand this just based on the code. I also could explain my ideas in more detail then.
musicdiff v4 ignores voices (released today).
Nice. I am still hoping we could have a meeting to help me better understand your approach and maybe exchange some ideas
Greg Chapman @.***> schrieb am Fr., 9. Aug. 2024, 23:07:
musicdiff v4 ignores voices (released today).
— Reply to this email directly, view it on GitHub https://github.com/gregchapman-dev/musicdiff/issues/12#issuecomment-2278757349, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOYCLPYWDEAPR65XC4AHTP3ZQUVPPAVCNFSM6AAAAABHO7YLEKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZYG42TOMZUHE . You are receiving this because you authored the thread.Message ID: @.***>
Hi Daniel, sorry for the delayed reply. I will be happy to meet, but after the 26 of august.
Write me an email so we can arrange the meeting,
Hi,
no problem. What about August 27?
Francesco Foscarin @.***> schrieb am Sa., 10. Aug. 2024, 10:50:
Hi Daniel, sorry for the delayed reply. I will be happy to meet, but after the 26 of august.
Write me an email so we can arrange the meeting,
— Reply to this email directly, view it on GitHub https://github.com/gregchapman-dev/musicdiff/issues/12#issuecomment-2280405039, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOYCLPZV257DAW4NIP7ET6TZQXH37AVCNFSM6AAAAABHO7YLEKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBQGQYDKMBTHE . You are receiving this because you authored the thread.Message ID: @.***>
Hello, good for me, but write me a private email (not a public github message) so we can arrange it. I have no way to find your address. You can see mine on my papers, like this one https://arxiv.org/pdf/2407.21030.
Best,
Francesco
I think it would make a lot more sense for a musicdiff tool to compare the correctness of events (all chords and rests that happen at the same time) rather than voices. If you have multiple voices and in one voices a chord/rest has a wrong duration, this will cause all subsequent chords/rests to be "out of synch" with the other voices. Right now, musicdiff counts this as one error, which does not accurately reflect the amount of work this would need to be corrected. Humans read scores from left to right and simultunaity of chords/rests is visually much more dominant. Going voice by voice and then backing up is really an artifact of musicxml's score representation. It would make a lot more sense to analyze the score on an event to event basis. This way, a duration error within a voice would cause a lot more errors as it should.
I would like to use this tool for my OMR research