notator / Moritz

Contains two related Windows desktop programs, written in C#, that I use for creating scores.
MIT License
7 stars 2 forks source link

Save MIDI messages as numeric data in SVG files #2

Closed notator closed 7 years ago

notator commented 7 years ago

Following discussions with the programmers of other music notation applications, I've decided that it would be better to save low level MIDI messages in SVG scores, rather than using the current, high level MIDI-XML. This has the following advantages:

I'd welcome any further discussion, should anyone have constructive suggestions that would improve the model described below. Ideally, my Assistant Performer should be able to play scores created by any third-party software.

At https://github.com/rism-ch/verovio/issues/379, I said:

[...] the MIDI messages could be included in numeric form in the eventSymbols rather than being the result of parsing XML. Each eventSymbol would be simply be given a sequence of midiMoments like this:

<g class="eventSymbol" score:xalignment="1234.5678" ...>
<score:midiMoments>
<midiMoment msDuration="500">
<midiMessage status="0x90" data1="0x3C" data2="0x40" \>
<midiMessage status="0x90" data1="0x40" data2="0x40" \>
<midiMessage status="0x90" data1="0x43" data2="0x40" \>
<!-- more midiMessages can go here -->
</midiMoment>
<!-- more midiMoments can go here -->
</score:midiMoments>
<!-- the eventSymbol's graphics go here -->
</g> <!-- end of eventSymbol -->

Where msDuration is the number of milliseconds that should elapse until the following midiMoment. The midiMessage attributes can be read into a Uint8Array and sent directly to the output device. It would, of course, be possible to express this XML more succinctly, and in a form that would be quicker to parse. This example is just to get the idea across.

notator commented 7 years ago

Last Change: 13.02.2017

Ideally, my Assistant Performer should be able to play scores created by any third-party software.

This problem has two parts that should be kept conceptually separate:

  1. define a format for storing MIDI data inside SVG duration symbols.
  2. define other data required specifically by the Asssisant Performer.

The solution to part 1 should be useful in SVG file formats that are intended for other purposes. The Assistant Performer is not the only scenario that requires graphic and temporal information to be synchronized. For example: synchronization is the key to creating an interface between score editors (like Dorico) and DAWs (like Cubase).

If such formats are properly documented, and sufficiently compatible, it may well be possible to write scripts for converting one format to another. Such scripts will be easier to write if the underlying MIDI format is identical.

Part 2 (the format required by the Assistant Performer) is defined below in the score namespace. My proposal for part 1 is to be found in the <score:midi> sub-definition. Constructive suggestions as to how to improve it would be very welcome.

The score namespace

This namespace defines the elements and attributes that have to be implemented in an SVG file for it to be playable in my Assistant Performer application. It includes not only the <score:midi> definition but also (for example) information about the objects that the Assistant Performer greys out when a staff is disabled.

Top level requirements:

Container content: The systems element contains one or more system elements and possibly other elements. The system elements are in chronological order. A system element contains either a <score:leftToRight> or a <score:topToBottom> element, one or more staff elements, and other elements. Every system in the score reads in the same direction, but the <score:leftToRight> and <score:topToBottom> elements have different attributes: A <score:leftToRight> element has two attributes: systemTop and systemBottom. A <score:topToBottom> element also has two attributes: systemLeft and systemRight. The systemTop, systemBottom, systemLeft and systemRight attributes are coordinates of the system's bounding box (including all the graphic objects it contains). The Assistant Performer uses these values to calculate the end points of the running cursor which appears during a performance of the system. The direction in which the score is read determines the meaning of the durationSymbol's score:alignment attribute. (Classical Asian notations read top-to-bottom.) Each staff element contains a stafflines element and one or more voice elements. The stafflines element is a <g> element containing a sequence of standard SVG <line> elements (having coordinates x1, y1, x2, y2). A voice element contains one or more durationSymbol elements and other elements. The durationSymbol elements are in chronological order. Each durationSymbol has a score:alignment attribute. This will either be an x-coordinate or a y-coordinate, depending on whether the score reads left-to-right or top-to-bottom (see the system definition below). This is the alignment to which the running cursor is moved when the durationSymbol begins to play. Note: Currently, the Assistant Performer only plays scores that read from left to right, but this could change in future. Each output durationSymbol contains one or more <score:midi> elements and other elements (graphics). Every such output durationSymbol in the score has the same number of <score:midi> elements. Each <score:midi> element contains a temporal definition of its containing durationSymbol, and each set of <score:midi> elements at the same level inside all the relevant durationSymbols describes a different interpretation of the score. (A full definition of the <score:midi> element is given below.) Note: The Assistant Performer currently just plays the first level of <score:midi> elements, but that could change. The object classes that are greyed out when a voice is disabled are: 'outputChord', 'inputChord', 'cautionaryChord', 'inputRest', 'outputRest', 'clef', 'barline', 'staffName', 'beamBlock', 'clefChange', 'endBarlineLeft', 'endBarlineRight'. TODO: describe the content of an input durationSymbol...

A skeleton SVG document (without input elements) using the score namespace:

<?xml version="1.0" encoding="utf-8"?>
<!-- a fontsStyleSheet goes here, for example: 
<?xml-stylesheet href="../../fontsStyleSheet.css" type="text/css"?>
-->
<svg ...
    xmlns="http://www.w3.org/2000/svg"
    xmlns:score="http://www.james-ingram-act-two.de/open-source/svgScoreNamespace.html"
    ...>
    <!--
    Other elements can also exist inside this <svg> element (<title>, <metadata>, <defs> etc.),
    but there is only one "systems" container (a <g> element that has a score:hasMidi attribute).
    The value of the score:hasMidi attribute is usually "true", since the <g> (eventually)
    contains one or more <score:midi> elements. -->
    --> 
    <g class="systems" ... > <!-- The unique "systems" container -->
        <!--
        Other elements can also exist inside this "systems" container, but there are
        one or more "system" elements.
        The "system" elements are in chronological order.
        -->
        <g class="system" ... >
            <!--
            Other elements can also exist inside this "system" container, but there
            is either one <score:leftToRight> or one <score:topToBottom> element, and
            one or more "staff" elements.
            -->
            <!--
            The <score:leftToRight> or <score:topToBottom> element determines the order
            in which the system will be performed (and the meaning of the durationSymbol's
            score:alignment attribute).
            The systemTop and systemBottom attributes are coordinates of the
            system's bounding box, including all the graphics it contains.
            -->
            <score:leftToRight systemTop="42.3456" systemBottom="1234.5678" ... />
            <!--
            There are two classes of "staff": "inputStaff" and "outputStaff".
            Output staves contain "outputVoice"s
            Input staves contain "inputVoice"s.
            Input staves are optional. Here, for example is an outputStaff:
            -->
            <g class="outputStaff" ... >
                <!--
                Any number of components go here, including a stafflines group and one or more
                voices.
                Output voices contain "outputChord"s and "outputRest"s.
                Input voices contain "inputChord"s and "inputRest"s.
                -->
                <g class="stafflines">
                    <line ... />
                    <line ... />
                    <!-- etc.-->
                </g>
                <g class="outputVoice" ... >
                    <!--
                    Many elements can exist inside this voice container, including one or more
                    durationSymbols. The durationSymbols in an outputVoice are either "outputChord"s
                    or "outputRest"s.
                    Every "outputChord" and "outputRest" contains one or more <score:midi> elements
                    The durationSymbols in an inputVoice are either "inputChord"s or "inputRest"s.
                    The durationSymbol elements are in chronological order inside each voice. 
                    -->
                    <g class="outputChord" score:alignment="1234.5678" ... >
                        <!--
                        Any number of elements go here, including one or more <score:midi>
                        elements (temporal definitons) and the durationSymbol's graphics
                        (its spatial definition). Moritz only writes one <score:midi> element
                        per durationSymbol, but other applications could write more.
                        -->
                        <score:midi>
                            <!--
                            A temporal (MIDI) interpretation of the durationSymbol goes here.
                            -->
                        <score:midi>
                    </g>           
                </g>        
            </g>
        </g>
    </g>
</svg>

The <score:midi> element definition

<!--
All durationSymbols, that contain one or more <score:midi> elements, contain the same number of
<score:midi> elements, regardless of their class (unclassed, rest, note, chord etc).
-->    
<score:midi>
    <!--
    There must be exactly one <moments> element in a <score:midi> element.
    At run-time, messages having the same timestamp are always sent in the following order:
                moment/noteOffs, moment/switches, envs messages, moment/noteOns.
    Messages inside each of these categories will be sent in the order defined in the SVG.
    For easier debugging, Moritz writes status bytes in hexadecimal notation and both data1
    and data2 values as ordinary decimals.
    -->  
    <moments>
        <!--
        There must be one or more (sequential) <moment> elements here.
        Each <moment> has an msDuration attribute that is an integer greater than zero.
        This is the number of milliseconds between the beginning of this <moment> and the
        beginning of the next (possibly in the <score:midi> element in the following
        durationSymbol). The msDuration of this <score:midi> element (durationSymbol)
        is thus the sum of the msDurations of its <moment> elements.
        Each <msg> element defines a message as a string of space-separated (8-bit) bytes,
        that the client converts to a Uint8 array before sending it to the output device.
        Most <msg>s have three bytes, but some (e.g. patch change) only have two, and sysEx
        has an unlimited number.
        -->        
        <moment msDuration="500">
            <!--
            Each <moment> may have zero or one <noteOffs> element.
            -->
            <noteOffs>
                <!--
                One or more noteOff (or NoteOn+Velocity0) <msg> elements go here.
                -->
                <msg m="0x80 60 64" /> <!-- NoteOff+channel, note, velocity -->
                <msg m="0x80 64 64" /> <!-- NoteOff+channel, note, velocity -->
                <msg m="0x80 67 64" /> <!-- NoteOff+channel, note, velocity -->
            </noteOffs>

            <!--
            Each <moment> may have zero or one <switches> element.
            A <switches> element contains a sequence of one or more <msg> elements.
            The messages are sent in the order they appear here in the file.
            No plausibility checks are done by the client inside the <switches> element.
            Some commands and controllers may be sent from both here and the <envs>
            element (e.g. pan), but it makes no sense for this to happen inside the
            same <score:midi> element because messages sent with the same timestamp
            from the <envs> element are sent later than (i.e. override) any messages
            sent from the <switches> element.
            -->
            <switches>
                <!--
                Two simple message examples
                -->
                <msg m="0xB0 0 3" /> <!-- ControlChange+channel, bankControl, value -->
                <msg m="0xC0 14" /> <!-- PatchChange+channel, value -->

                <!-- pitchWheel deviation example
                Changing the pitchWheel means sending two (or four) messages that use the
                following MIDI constants:
                        const PDS = 0; (PITCHWHEEL_DEVIATION_SELECT);
                        const RPC = 101; (REGISTERED_PARAMETER_COARSE)
                        const RPF = 100; (REGISTERED_PARAMETER_FINE);
                        const DEC = 6; (DATA_ENTRY_COARSE);
                        const DEF = 38 (DATA_ENTRY_FINE);
                Moritz currently just sets the coarse setting (semitones).
                -->
                <!-- set coarse pitchWheel deviation value to (e.g.) 12 semitones -->
                <msg m="0xB0 101 0" /> <!-- ControlChange+channel, RPC, PDS -->
                <msg m="0xB0 6 12" /> <!-- ControlChange+channel, DEC, semitones -->
                <!-- set fine pitchWheel deviation value to (e.g.) 5 cents -->
                <msg m="0xB0 100 0" /> <!-- ControlChange+channel, RPF, PDS -->
                <msg m="0xB0 38 5" /> <!-- ControlChange+channel, DEF, cents -->

                <!-- System Exclusive example
                The first byte in a sysEx message must be 0xF0, the last must be 0xF7.
                SysEx messages sent to a device having the wrong Manufacturer's ID are ignored.
                Moritz never writes system exclusive messages into a score.
                The Assistant Performer will send sysEx messages, but cannot respond to their
                return value (if any).
                All numbers in a SysEx msg must be in hexadecimal notation.
                -->
                <msg m="0xF0 0x41 0x10 0x42 0x12 0x40 0x00 0x7F 0x00 0x41 0xF7" />

            </switches>

            <!--
            Each <moment> may have zero or one <noteOns> element (rests have no <noteOns>).
            --> 
            <noteOns>
                <!--
                One or more noteOn <msg> elements must go here.
                -->                         
                <msg m="0x90 64 90" /> <!-- NoteOn+channel, note, velocity -->
                <msg m="0x90 68 100" /> <!-- NoteOn+channel, note, velocity -->
                <msg m="0x90 71 127" /> <!-- NoteOn+channel, note, velocity -->
            </noteOns>
        </moment>
    </moments>
    <!--
    There must be zero or one <envs> element in a <score:midi> element.
    (Moritz never writes an <envs> element into a rest, but other apps might. In Moritz,
    outputChords are never laissez vibrer...)
    Note that all controllers that can be set using an envelope can also be set using a simple
    <msg> in the above <switches> element.
    -->   
    <envs>
        <!--
        There must be one or more <env> (=envelope) elements here.
        In each <env> and nested <vt> (vertex) element, s is status, d1 is data1, d2 is data2.
        msDur is the duration in milliseconds to the following <vt> or <score:midi> element.
        The total duration of each <env> element must equal the total duration of the <moments>.
        Messages sent between those in the <vt> elements will be calculated and sent by the client
        application.
        All msDur values must be greater than zero, with one exception: If the <env> has
        more than one <vt> element, the final one may have a duration of zero milliseconds,
        so that the client can calculate the preceding intermediate messages. However, the
        client never actually sends the data values from a <vt> having msDur equal to zero.
        Note that <env> elements can only have status bytes with the following high nibbles:
            0xA (aftertouch),
            0xB (control change),
            0xD (channel pressure),
            0xE (pitch wheel).
        An <env> element will be ignored if its status byte has any of the following high nibbles:
            0x0, 0x1, 0x2, 0x3, 0x4, 0x5 0x6, 0x7, (MIDI status bytes never have these values) 
            0x8 (noteOff),
            0x9 (noteOn),
            0xC (patch change),
            0xF (system exclusive, real time or undefined)
        -->
        <!--         
        There can be zero or more Aftertouch <env> elements (for different note numbers)
        -->
        <env s="0xA0" d1="100"> <!-- Aftertouch+channel, d1=note number -->
            <!--
            There must be one or more vt elements
            d2 (the pressure amount) is in range 0..127. 
            the msDur values add up to the total duration of the moments (here 500ms)
            -->                    
            <vt d2="99" msDur ="250" />
            <vt d2="127" msDur ="249" />
            <vt d2="0" msDur ="1" />
        </env>
        <!--
        There can be zero or more CC controller env elements (for different continuous controllers).
        -->
        <env s="0xB0" d1="11"> <!-- ControlChange+channel, d1=control number -->  
            <!--
            There must be one or more <vt> elements
            d2 is in range 0..127, and is the (d2) value of the controller. 
            the msDur values add up to the total duration of the <moments> (here 500ms)
            -->  
            <vt d2="0" msDur ="100" />
            <vt d2="90" msDur ="150" />
            <vt d2="10" msDur ="249" />
            <vt d2="0" msDur ="1" />
        </env>
        <!--
        There can be zero or one channel pressure env element
        -->
        <env s="0xD0"> <!-- ChannelPressure+channel --> 
            <!-- 
            There must be one or more <vt> elements
            d1 is in range 0..127, and is the pressure amount.
            (d2 is undefined since not needed by this command) 
            The msDur values add up to the total duration of the <moments> (here 500ms)
            -->                    
            <vt d1="99" msDur ="255" />
            <vt d1="127" msDur ="45" />
            <vt d1="0" msDur ="200" />
        </env>
        <!-- 
        There can be zero or one pitch wheel env element
        -->
        <env s="0xE0"> <!-- PitchWheel+channel -->
            <!--
            There must be one or more <vt> elements
            d1 and d2 are in range 0..127, and combine to be the setting of the pitch wheel. 
            the msDur values add up to the total duration of the <moments> (here 500ms)
            -->         
            <vt d1="99" d2="99" msDur ="120" />
            <vt d1="127" d2="127" msDur ="379" />
            <vt d1="64" d2="64" msDur ="1" /> <!-- 64 is default, centre value -->
        </env>
    </envs>
</score:midi>

Moritz specifics

Implementation detail: If the msDuration of the last <moment> in a Moritz MidiChordDef is zero, then Moritz writes that <moment>'s messages into the first <moment> in the following durationSymbol -- unless this is the final <moment> in the channel in the score, in which case the <moment> is omitted. It is the client app's responsibility to reset the output device at the end of a performance (see below).

Moments and rests: Every <moment> in the score has an msDuration greater than zero. (Output) rests always have a single <moment> element. If the rest is at the start of the score, then its <moment> will be empty. The <moment> will, of course, still have an msDuration attribute that determines the rest's duration. Rests should never contain noteOn messages. Moritz never writes an <envs> element (that describes control envelopes) into a rest, but other applications might.

Messages sent by the Assistant Performer The Assistant Performer must send AllSoundOff and AllControllersOff messages to each channel at the end of each performance (partial or otherwise). If the sound is supposed to die away at the very end of the score, then a sufficiently long moment must be defined there. When beginning to play after the beginning of the score, do the following:

  1. Send AllSoundOff and AllControllersOff messages to each channel
  2. Set the non-default values of any controllers, patches etc. after finding the most recent value to which they were set before the start position.
  3. Start performing.

Moritz outputRest example

Here is how Moritz might write an outputRest:

<g class="outputRest" score:alignment="1234.5678"> 
    <!--
    The <score:midi> definition is exactly the same for outputChords and outputRests,
    but there should never be more than one <moment> in an outputRest.
    -->
    <score:midi>
        <!--
        There is always exactly one <moments> element in an outputRest.
        -->
        <moments>
            <!--
            There is always exactly one <moment> element in an outputRest
            -->
            <moment msDuration="500">
                <!--
                Zero or one <noteOffs> element goes here.
                -->
                <noteOffs>
                    <!--
                    One or more noteOff (or NoteOn+Velocity0) <msg> elements go here.
                    -->
                    <msg m="0x80 60 64" />
                    <msg m="0x80 64 64" />
                    <msg m="0x80 67 64" />
                </noteOffs>
            </moment>
        </moments>
        <!--
        Moritz does not write an <envs> element in an outputRest, but other applications might.
        -->
    </score:midi>
    <!--
    The outputRest's graphics go here.
    -->
</g> <!-- end of outputRest --> 

ji, www, December 2016, January 2017, February 2017

notator commented 7 years ago

Verovio now includes more class info in their SVG. https://github.com/rism-ch/verovio/issues/194#issuecomment-286191845 and https://github.com/rism-ch/verovio/issues/520

I think all SVG authoring programs should do this. Not only is this useful for applying CSS styling, it makes it possible for client apps to identify the elements they want to use. For example, my Assistant Performer turns staves and their contents grey when they are disabled. If we had standard names for at least the more common objects, such apps could be written to take input coming from different sources... Hi Craig, Laurent! Bravo! :-)

notator commented 7 years ago

I am closing this issue temporarily so that the W3C Music Notation CG doesn't get confused. I want to discuss <score:midi> elements at the forthcoming meeting in Frankfurt (7th April) but not necessarily the way Moritz structures its SVG. Issue #3 has now been created to discuss <score:midi> elements.