JSON: Encoding heterogeneous subobjects

adrianholovaty commented 1 year ago

In MNX, there are a few places in which an object can contain heterogeneous children objects:

Part measure "content" can contain clefs or sequences.
Sequence "content" can contain events, grace notes, tuplets, octave shifts and more.
System layout "content" can contain stave groups or staves.
Stave group "content" can contain stave groups or staves.

In my initial migration from XML to JSON, I solved this by requiring a "type" object that identifies the type of child object. See this example document for an example.

"measures": [
   {
      "content": [
         {
            "type": "clef",
            "line": 2,
            "sign": "G"
         },
         {
            "type": "sequence",
            "content": [
               {
                  "type": "event",
                  "notes": [
                     {
                        "pitch": "C4"
                     },
                     {
                        "pitch": "E4"
                     },
                     {
                        "pitch": "G4"
                     }
                  ],
                  "value": "/2"
               },
               {
                  "type": "event",
                  "rest": {},
                  "value": "/2"
               }
            ]
         }
      ]
   }
]

This was a pretty fundamental decision, and the pattern is used in various places in MNX, so I wanted to bring this up for discussion. Are there better ways of doing it, or is this OK?

samuelbradshaw commented 1 year ago

I think this makes sense, but I would recommend being consistent with type and content all the way down:

"measures": [
   {
      "type": "measure",
      "content": [
         {
            "type": "clef",
            "line": 2,
            "sign": "G"
         },
         {
            "type": "sequence",
            "content": [
               {
                  "type": "chord",
                  "content": [
                     {
                        "type": "note",
                        "pitch": "C4"
                     },
                     {
                        "type": "note",
                        "pitch": "E4"
                     },
                     {
                        "type": "note",
                        "pitch": "G4"
                     }
                  ],
                  "value": "/2"
               },
               {
                  "type": "rest",
                  "value": "/2"
               }
            ]
         }
      ]
   }
]

Any thoughts on content vs. contents? I don't feel strongly, but I wonder if contents would make it more intuitive that it's a list, as in "table of contents" or "the contents of a box."

mscuthbert commented 1 year ago

I think my preferred approach is to make anything that expects multiple types to have a list of two-element lists (tuples) where the first element is always the type and the second the data:

"measures": [
   {
      "content": [
         ["clef", {
            "line": 2,
            "sign": "G"
         }],
         ["sequence", {
            "content": [
               ["event", {
                  "notes": [
                     {
                        "pitch": "C4"
                     },
                     {
                        "pitch": "E4"
                     },
                     {
                        "pitch": "G4"
                     }
                  ],
                  "value": "/2"
               }],
               ["event", {
                  "rest": {},
                  "value": "/2"
               }]
            ]
         }]
      ]
   }
]

The reasoning here is that we've found in MusicXML that Elements end up reappearing in many places within the system (accidentals appear on notes/pitches, but also on ornaments such as turns, figured-bass and chord symbols, etc.) and we won't be able to anticipate where an element will appear in the future. So either each element always has a "type" that says what sort of object it is (which leads to redundancy like "pitch": {"type": "pitch", ...} or the parent element which can have heterogenous contents is responsible for specifying what content it has, and matching it with a parser.

Or if this is harder for the schema there's some other type of intermediary object which holds type and content only, with all other attributes in a sub-object.

"measures": [
   {
      "type": "measure",
      "content": [
         {
            "type": "clef",
            "content": {
              "line": 2,
              "sign": "G"
           },
         }
         {
            "type": "sequence",
            "content": [
               {
                  "type": "event",
                  "content": {
                    "notes": [
                       {
                          # perhaps, but not sure...
                          "type": "pitch",
                          "content": "C4"
                       },
...

either of these formats have tradeoffs. I think we won't know for sure which is best until we have a toy parser in both an unstructured-object language (Python/Javascript) and a struct-object type environment (C++, etc.)

mscuthbert commented 1 year ago

for content vs. contents I don't have a strong feeling, except to always think of MNX from the point of view of the consuming application, not the file structure (that's one of the key changes from MusicXML that I'm excited about). So what's more intuitive: part.measures[1].contents[0].line or part.measures[1].content.clef.line -- I think that anything with a plural should indicate that it needs an [i] after it and anything without a plural needs an attribute

w3c / mnx

JSON: Encoding heterogeneous subobjects #295