Nice proposal, but it needs work

soundio / sequence-json

A way of creating music sequence data in JSON

136 stars 5 forks source link

Nice proposal, but it needs work #2

Open mauritslamers opened 8 years ago

mauritslamers commented 8 years ago

First of all it is great to see that there are more people thinking about this. What is clear however is that a few fundamental aspects are missing from your proposal. To me this is because you seem to have based your work on OSC and MIDI which both are rather poor representations of music, because in their essence these protocols are controller streams.

My suggestion would be to read up with the work of Peter Desain and Henkjan Honing, most specifically their work on representing music in "Lisp as a second Language: Functional Aspects", as well as have a very thorough look at how Lilypond (http://lilypond.org) implements this system. Together with a partner I am also developing a JSON protocol for music, which I will be using to replace the current model (which also doesn't take the work of Desain and Honing into account) in my canvas based notation system Presto (http://mauritslamers.github.io/presto).

danigb commented 8 years ago

I've found the paper you mention here: http://www.mcg.uva.nl/papers/lisp2nd/oo1.pdf

I will read it. Thanks for the suggestion. And presto looks very promising. I'll check it out soon for sure.

mauritslamers commented 8 years ago

The link you give is a different paper (Object Oriented Style I). While it is useful, it is better to read the functional one first: http://cf.hum.uva.nl/mmm/papers/lisp2nd/functional.html

danigb commented 8 years ago

Great, thanks!

stephband commented 8 years ago

Oh man, there's some neat ideas in there. Thanks for this. I'm going to take a few readings to understand it. Some recursion is possible already, as the "sequence" event takes a sequence reference, but not to the level suggested in this paper. I like the idea of being able to declare transforms, as in something like:

['transpose', 2, 'invert-diatonic', 'Bb', 'sequence', ... ]

I agree that there is room for more notation hints in this spec, as you do at in the examples at http://mauritslamers.github.io/presto/.

Is your JSON protocol githubbed anywhere?

danigb commented 8 years ago

This is what I've tried to do with ScoreJS: https://github.com/danigb/scorejs#builder-plugin

For sure I will stole some ides from there to the next version.

danigb commented 8 years ago

Another paper I like a lot and I think it could be relevant to this is repository: The Haskell School of Music http://www.cs.yale.edu/homes/hudak/Papers/HSoM.pdf

mauritslamers commented 8 years ago

@stephband The JSON format we are developing is not on github yet, not in the least because I feel it is too much in flux still. As soon as there is something more stable, I would be happy to share. We are approaching this from a music notation angle, as we need a protocol which enables us to transfer musical information between different types of music notation. The music notation system however follows the Lilypond approach: describe the music itself and the notation application should do the rest (as I partly have implemented in Presto). Including performance information however is very much part of our thinking.

My current view on this is that this most likely will lead to two different types of protocol, self-contained and streaming. The self contained version would contain both the music itself as well as any performance related information in a Desain and Honing type parallel / sequential system in a single json document. The steaming version would consist of a series of json documents, of which the first would be a header containing most of the required structure and afterwards a non-nested list of musical events in a midi kind of fashion.This allows a piece to be easily broken up in a series of requests.

What is your vision on this?

mauritslamers commented 8 years ago

@danigb The advantage of coming from a music notation angle is that it is important to re-evaluate certain typical Western music based assumptions, such as the 12 tones per octave standard (which by the way is a very crude representation and far removed of what actually is happening).

danigb commented 8 years ago

Hi @mauritslamers,

Yes, I fully agree on re-evaluate western based assumptions (as an avid listener of non western music), but can you explain a little bit how coming from a music notation angle can help? Or what ideas do you have regard to this topic?

Lately I've been playing with javascript function generators to create endless streams of notes. Do you think it can fit inside the streaming version of your protocol?

mauritslamers commented 8 years ago

There are many ways of representing music, of which music notation is one. Because music contains loads of information, because its purpose of communication music notation (of any kind) will always be a strong abstraction of the "reality" of music. Music notation also needs to create an overview of the structure of the music, the number of instruments, etc etc. In a way, music notation forms the bare minimum of musical information.

After having used Lilypond (http://lilypond.org) for years, I recently started looking at Desain and Honing's work again, and suddenly realized that Lilypond effectively implements their system of parallel and sequential events, and uses it to generate music notation. The input of Lilypond uses a LaTeX like syntax, but the way the music is described in this language is very much Desain and Honing. (As a sidenote: I only realized this after I wrote Presto, which was I think instrumental in this realization).

The advantage of coming from a Lilypond / notation angle is that you realize that a lot of music protocols and/or descriptions (such as MIDI) make fundamental mistakes either to the structure or to the nature of music. One of these mistakes is to assume that F sharp and G flat are the same note. This might be the same key on a keyboard, but there is a reason why they are not considered as the same note: their meaning is different. Therefore, in non-keyboard contexts (which is the majority of contexts!) these notes are interpreted differently.

What I like about the approach Lilypond took for the description of music is that the validity of the music or the musical expression is not doubted or reinterpreted. When you write something, that is what Lilypond will notate. For a protocol which claims to transport musical information, I think it is paramount to follow that approach as well.

For the performance or controller part of the musical data, I think this can be attached to events which describe the music, either by reference or by integration. For the streaming part: essentially that should be a kind of event list, where everything works by reference. Consequently, it shouldn't be too difficult to make that endless, as you can keep creating new packets with new data and "attach" it at an existent structure.

danigb commented 8 years ago

Thanks @mauritslamers for the detailed answer. Now I understand what you mean by "coming from a notation angle" (or at least better than before). I would like to dig deeper in what you call "fundamental mistakes either to the structure or to the nature of music". Do you have more information? (some paper, article, book... or maybe you want to develop the idea ;-)

My interests are focused on generating and manipulating music programmatically (always in a form of an abstract notation, not the sound itself), so choose the right data structure to represent the music is crucial. What I did with ScoreJS was to define an array of events with position, duration and value (the most common value was a note) but I notice both papers ("Lisp as a second Language" and "The Haskell School of Music") uses the concept of sequence and parallel to create structures, and both combines them in a hierarchical way. In "Lisp as a second Language" we can read:

In general it can be said that recursion is the natural control structure for hierarchical data. And hierarchical structures are common in music.

But I found working with that kind of structures in javascript a bit awkward, and it make it harder to build/code music manipulators... So I'm not sure if they worth the effort.

With a quick look to your code, it seems that you use an array to represent sequences and a sub-array to represent parallel inside the sequences... so it seems you implement the first part of the idea (the two types of structures) but not the hierarchical data... do you have any advice?

@stephband Create music fragments with transformations in a declarative manner like ['transpose', 2, 'invert-diatonic', 'Bb', 'sequence', ... ] is one of my priorities. In fact, some of this can be made with ScoreJS, but I realize that the current version have some big flaws (for example: it uses a hash map instead of array, so the order or transformations are not guaranteed). Do you have any suggestion?

Thanks in advance for your time.

stephband commented 8 years ago

@danigb

(for example: it uses a hash map instead of array, so the order or transformations are not guaranteed). Do you have any suggestion?

Errm, use an array? :)

Honestly, reading that paper, and a recent post on medium (https://medium.com/@chetcorcos/functional-programming-for-javascript-people-1915d8775504#.7jua8f285) inspired me to look into functional ways of applying those transforms, so that they are lazily evaluated. I think that's probably the way to go. So each array of instructions would become a pipe() that gets called with events as they are needed. For live playback this is crucial, for printing a score out your mileage may vary.

@mauritslamers

I'm liking the idea of making sequences of events streamable. It seems to me that music is a stream of events, because the final proof is the performance, not the notation. I consider the notation to be a rendering of a sequence of musical data, where the rendering is the job of some kind of process that interprets the data. The performance may contain more nuance than an interpreter wants to print, so the resulting notation is a kind of reduction of the actual music. A good interpreter is able to tell the different between F# and Gb. eg – Scribe uses a Hidden Markov Chain to determine key centre for each measure and prints F#s or Gbs accordingly. It's a bit 'experimental', but you can see that in this rendering http://labs.cruncher.ch/scribe/, where there is nothing in the data telling it to display sharps or flats. I'm not sure I agree that MIDI makes a mistake about this - it uses a system of evenly-tempered integer notes, but in a modern spec floats could be used to represent non-evenly tempered notes.

danigb commented 8 years ago

@stephband

Yes! I'm in the pursuit of a functional way to solve this problem for more than a year (https://github.com/danigb/tonal is an attempt to express some musical entities and transformations with pure composable functions, EDIT: and that's why I'm experimenting with function generators to provide lazyness). I will read the post on medium, and scribe looks really nice!

@mauritslamers

I've just finished "Lisp as second language" and it's really amazing. Thanks!

I've also found lot of other papers from Music Cognition Group (that includes this one) here: http://www.mcg.uva.nl/publications/list.html

In fact Programming Language Design for Music seems to give answer to some of my questions above.

Lot to learn!

danigb commented 8 years ago

Hi, I've created an incomplete javascript implementation of the paper here:

https://gist.github.com/danigb/e7dd8e1d4f1ba93f83df

One important part is missing: the timed-transformations. I will add it shortly and my idea is that add common music transformations and package as a library (maybe scorejs v2). What do you think?

@mauritslamers After think about it, I agree with @stephband that using numbered pitches instead of names could be a good idea. The note names describe the meaning and relations inside a music theory framework (usually a western music theory), but it's not an universal musical meaning (if such thing exists). At the end, frequencies are closer to the real sound that any note name, so midi numbers (specially if used with float precision) are more accurate way to abstract sound characteristics.

What surprised me about your affirmation about the "advantages of coming from a notation perspective" is that in fact, our western notation has deep roots inside the tonal (western) music theory: the way it organizes pitches (in discrete elements separated by semitones), the way it organizes the notes in a staff (using a diatonic scale as reference), the way it organizes time inside equal division of time... Lot of that have no sense in other music traditions....

stephband commented 8 years ago

@danigb Ooo nice. Will look at that later.

My intention for music JSON was to allow both numbers and names in note definitions, simply so that authors could write in 'western' notation if they so chose. Implementations would convert to numbers before processing (with a function such as this one https://github.com/soundio/music/blob/master/js/music-harmony.js#L494). I wanted feedback on that though – there are advantages to keeping the spec strict.

mauritslamers commented 8 years ago

We are most likely going to publish our first draft of the basic JSON format soon. As the type of discussion starts exceeding the github issue format, I will send you an e-mail with a more thorough explanation of my thoughts.

mauritslamers commented 8 years ago

Both your reactions make me a bit despair that some things will never be understood.

What I feel is that you both try to apply western music theory on what you are doing, and sadly this is very dangerous because standard music theory is not to be trusted if you want to learn something about music. At all.

Music theory started out as a teaching tool for composition and at some point people, who did not entirely understood what it is for, took over (which happened mostly after 1750) and as a result it lost the composers approach over time. Consequently, it misrepresents almost everything it describes. It shows signs of the typical "enlightenment" way of collecting information. It is like an encyclopedia: loads of articles about many different things, with many different names, but how articles should be linked together is vague at best. This is also why some people "get" music theory, and are able to apply it freely (improvisation/composition), while most don't.

What you see in the Music Information Retrieval and Music Cognition fields is that people are trying to do things with music while not being a musician or composer (usually). This leads to all kinds of weird approaches, because it seems that most people get lost in the physics aspects of music, whereas that is the least interesting part of music. Also, people seem to misunderstand that there is essentially nothing "theoretical" about music theory: nothing can be proved in any kind of way.

The example of using a Markov chain in order to determine the key of a piece shows that the writer(s) do not understand what a musical key actually is. That it also uses the key of a piece to determine whether things are a certain key shows the underlying (Western) assumption that F# and Gb are the same. They are not. F# and Gb are two different entities, like the letters a and b are different entities, and both can happen in all keys. They are also likely to be a different pitch, as the intonation will differ.

It is rather difficult to tell why this is exactly in this format as it is a very long story which can only be really understood by doing it. Suffice it to say that when you describe music, you should describe the music from a semantical point of view, taking from existing representations whatever makes sense semantically. Numbered pitches instead of names do not make sense, because this assumes there are fixed pitches, which just happens to be an agreement. If a choir sings a piece without accompaniment, and they drop (as they usually do) a bit you will perhaps notice this. Semantically however, the different pitch will not make any difference. The piece will not drastically alter.

You can have this text read aloud by a woman's voice, or a man's voice, and in both cases the meaning won't differ. The same for a song: when today a woman sings a tune and tomorrow you hear the same tune sung by a man, you will still recognize it as the same song. Consequently, pitch (especially absolute pitch) is not musical information as such. (It does play a role with all kinds of associations and emotional responses, but the basic semantics - the grammar if you like - do not change).

As soon as you are going to use any kind of statistics to determine what things are semantically (F#/ Gb), you do not understand music and you assume that it cannot be modelled. It can be modelled, and I have made models that work, never published them though.

And yes, MIDI is the most wrong way to describe music, because it renders the entire view of music through a piano, which leads to extremely poor results (just try to imagine what kind of musical expression you do not have on a piano). One of these results is that people think that the western music system has 12 tones. How much you would like to think this, but the western music system does not have 12 tones per octave. These 12 tones arose because of the limitation of the keyboard to effectively deal with the amount of options we do have, which either is much more, or less. The equal temperament system was known centuries before it became the standard, but people chose deliberately not to use it because it was (and still is) extremely flat and ugly. (And even the piano is not tuned in equal temperament, because then the discant (the top half) would sound too low.) At the same time, MIDI is a very nice format to describe a performance and consequently contains a wealth of information on how a musical performance can be described, and it is important to use that.

As I have written in above: a good interpreter doesn't make assumptions and takes the musical information it gets at face value. Therefore it is vital to take a semantic base and attach the performance information to it. This allows notation to be made by ignoring or stripping the performance information, as well as create a performance out of the same data by ignoring or stripping the notation information.

stephband commented 8 years ago

@mauritslamers

You raise interesting points. I agree with most of your post, but I think the aim of Music JSON is not quite what you are looking for.

I'm well aware of the shortcomings of equal temperament, the stretched tuning of the piano, and the unprovable nature of music 'theory'. Theory is definitely the wrong word. Any sufficiently accomplished musician holds their own 'theory' in their head – which, besides physical technique (if you can separate an improviser's technique from they're understanding of 'theory', and I'm not sure you can) is what makes Scofield Scofield and Reinhardt Reinhardt.

That said, there is a base to all theory which is undeniably common to all cultures, as evidenced by the fact that nearly everyone in the world can sing a major pentatonic. Now that is physics. A few simple things can be proved.

I also agree that many technical 'solutions' appear to be concocted by non-musicians. However, it is uncommon that an accomplished musician is interested in a technical, computer-related problem, so perhaps this was inevitable historically. With the advent of MIDI and Audio APIs in the web browser, though, the whole field becomes a lot more open, documented and accessible to many more people, and I'm convinced we're going to see some changes. There will be many more mis-steps, but that is the democratic nature of the web. It is not a Bad Thing.

I disagree that MIDI was 'wrong'. It was the right way to convey performance data between machines in the 1970s and it's not far off the right way now. MIDI is not concerned with semantics, it's only concerned with data exchange.

And I disagree that using a Markov chain demonstrates a misunderstanding of what a musical key is. While there are surely aspects of the notion of 'key' that I'm not familiar with, that is not the issue. The problem here is how to extract readable western notation from performance data, and a Markov chain proves to be a reasonably effective approach at guessing when a phrase should be written in Ab- or in G#- based on surrounding context. I'm not pretending it's any more than a guess. Surely you, as a musician, have had to read many charts poorly written on Sibelius or Logic whatever, so you'll know that even decent musicians can't get this right. Any practical, automated help we could give them as hints can only be a bonus.

... it is vital to take a semantic base and attach the performance information to it.

Is it, though? I can relate this point to HTML, where HTML has semantics readable by machine and a page of plain text does not (or at least, not without processing). But HTML is best written by hand. There is not a single editor out there that gets semantic markup right. It requires a human.

We don't have that luxury. We have to be pragmatic. If you don't have hand-written semantic data to start with, only the performance information that has come from MIDI or OSC input, which is what we're dealing with in many cases here, then starting from a semantic base is a non-starter.

It is not necessary to make semantic sense of a performance if you are simply going to transfer it to and play it back on another machine. The cases where it is necessary to make some semantic sense out of it are the cases where the music is consumed non-aurally, such as in a page of notation.

So let's say Music JSON's primary concern is storing and transferring performance data. If I can find ways of integrating semantic information, I think it is important and I want to do so, for those who want to hand-write in their musical intentions. Note that Music JSON already defines a 'chord' event type, which is a very fuzzy piece of data that is open to a huge amount of interpretation, is nigh-on useless to a simplistic playback program, but which has the potential to carry a lot of semantic information about key and harmonic structure. I would actually be happier calling them 'mode' events, but I think that risks being less understandable to an average user.

I hope you do publish your ideas, and let us know when you do, I will be interested to read them. Thanks for making me think :)

mauritslamers commented 8 years ago

@stephband Thanks for your detailed reply. I think you misunderstand me partly when you say that certain things can be proved. What I mean there is to say that whatever physics you can prove, the link with musical aspects or the musical understanding is at most a correlation, and as we both know that doesn't prove a thing.

About the Markov chain: you show exactly what I mean with "not understanding". The way you write your explanation shows that you think that the only option a machine has is to guess whether it is an G# or a Ab, and consequently a Markov chain is "required". What I mean is: if a human (who knows what he or she is doing) should do it, there would not be a moments doubt what it should be. Consequently, the people writing the software and who choose to use a Markov chain either do not understand how it works, or they are unable to transfer their understanding into software. In both cases I think it points to an insufficient understanding on how music works.

I follow your explanation about HTML, but I think it is the fault of the editors, not the fault of HTML. The issue with HTML editors (as with many other editors, think MS Word or Open/LibreOffice) is that they try to be WYSIWYG. While that is a nice idea, it causes huge issues with semantics as people will try to have things look a certain way and bypass the semantics in order to achieve that. If you look at editors like LyX (http://lyx.org) you see that it can be done in a different way, as long as you do not offer the user a way to alter the layout in a way that bypasses the semantics.

I think MusicJSON should be true to its name and transfer music. Both performance and notation are representations of music and it should be possible to include this information. Performance data and notation carry a shared resource, which is the semantical information of music. This should ideally be the backbone of the format. Especially when dealing with MIDI or with OSC like data it is possible to retrieve this semantical information. There are very interesting musical transformations that you can do with performance data which are impossible to do without the semantical information.

For a simplistic player it is much easier to deal with parallel and sequential structures. I agree that a chord is a very vague concept, but only if you look at it as a chord. Like so many things in music theory, a chord is a misrepresentation. Trying to describe music through chords is akin to describing a movie scene by a few stills taken from that scene. If you then try to reconstruct the scene from the description taken from the stills, the chances of getting it wrong are much bigger than getting it right. In our version of MusicJSON a chord is a parallel structure of one or more notes. Any other information should be either defined differently, or deduced through more effective means.

I don't know whether you thought of "mode" as something like "key", but essentially they are the same. Music theory want us to believe there is a huge difference between modal and tonal music, while they are more like strongly related dialects which are easily mutually understandable.

Edit: grammar.

mauritslamers commented 8 years ago

BTW, the publication is being prepared, but a few things need to be sorted out still. The remark about the average user is interesting, because I think the average user should not be writing MusicJSON by hand, or any other JSON file which is more than a few lines. I don't know anyone who would like to write a format like MusicXML by hand. Our format will be accompanied by a tool which translates Lilypond code to our JSON format, allowing people to write the music in Lilypond.

stephband commented 8 years ago

I follow your explanation about HTML, but I think it is the fault of the editors, not the fault of HTML.

Yes, I know all about WYSIWYM editors, and I agree. The metaphor breaks down, of course. What is the parallel with music editors? Is a notation editor a WYSIWYM, while a piano roll editor a WYSIWYG? So let's consider a practical problem. You have a a note, and you want it to push the beat, you want to nudge it a tiny fraction ahead of the rest of the arrangement to give it a sense of urgency and instability. What's the semantic solution in a notation editor? Writing an instruction "urgent and unstable" above the note?

The way you write your explanation shows that you think that the only option a machine has is to guess whether it is an G# or a Ab [...] the people writing the software and who choose to use a Markov chain either do not understand how it works, or they are unable to transfer their understanding into software.

"Guess" was a bad word to use. I mean of course that the machine can interpret the music following similar rules that we do. You may be right that a Markov chain is a bad model to use. Yes, it was an experimental answer to the problem I was having trouble modelling. But it works! Let's not get hung up on the Markov chain – the original point was that it's perfectly possible to give the machine the responsibility of sorting out how to display notes that are represented as numbers. You appear to agree, because you say

when dealing with MIDI or with OSC like data it is possible to retrieve this semantical information

So I'm not sure what your criticism is any longer.

There are very interesting musical transformations that you can do with performance data which are impossible to do without the semantical information.

I agree. But where the semantic information can be extracted from the performance data, what is the point of putting the semantic information in the data? That seems to me to be data crud.

I agree that a chord is a very vague concept, but only if you look at it as a chord. [...] I don't know whether you thought of "mode" as something like "key" [...]

Yes, 'mode' is something like 'key', that was my point about not really wanting to call it a 'chord' event. I don't think a chord is a "vague" concept, I think it is a non-deterministic concept – a good chord symbol can absolutely nail the intentions of a composer. My intention in Music JSON was to allow just a small set of well defined chord symbols that each describe distinct modes. Nonetheless, the performance of those modes, if a Music JSON consisted of only "chord" events, would be open to a lot of interpretation. Actually that's a good example of where some absolute semantics can lead to wildly divergent performances.

I disagree that 'mode' is synonymous with 'key', though, unless you are considering 'keys' that don't describe major scales or their relative modes to be 'keys'. I have never seen a piece of music notated with a key signature that does not belong to a major scale (or relative mode). Modes outside of these key signatures require accidentals, traditionally. For example, G∆b6 describes a harmonic major mode, but good luck putting a key signature of F# and Eb on your stave. You'll get laughed out of the orchestra.

the average user should not be writing MusicJSON by hand

You are right of course. I will likely rename 'chord' events to 'mode' to avoid future confusion. However, the average user I was referring to in this case would be a developer, the user of Music JSON. Most developers will be more familiar with the term 'chord' than the term 'mode'. Most musicians are too! So it will require some explanation.

mauritslamers commented 8 years ago

In the case of the note that you want to be ahead, there are a few things to consider. Assuming that the note would be on the first beat of the bar, you would want to have the notation put it as first beat in the bar, and semantically it is the first beat of the bar. Performance wise though you want it to be in front, and you indicate that in notation with something like "hastily". This brings us two layers, the first being the semantic underlayer (the note being the first beat of the bar), and a layer with two representations, one as notation and one as performance. Continuing in the same way you will probably want to play the note a bit shorter. Semantically it should have the length of the notation (assuming a 4th), notation wise you'd add a staccato and the performed length should be around 50-70%. This doesn't take much extra space, as you describe the note being played, and add performance and notation elements.

Because of your remark about music notation editors: did you ever take a look at Lilypond (http://lilypond.org) and its right hand Frescobaldi (http://frescobaldi.org)? This is what a semantic music notation editor can be.

So I'm not sure what your criticism is any longer.

My criticism is that if you do not take the lowest semantic level possible (which you seem to confuse a bit with notation) there will be all kinds of possible confusions. This semantic level may look a lot like MIDI, but it differs in the sense what is accepted as identifier (names instead of numbers) and a free name to frequency conversion scheme. This solves a lot of possible confusion or forcing non-western music into the western music framework.

The way you deal with chords is such a confusion. At one hand you want to describe them as notation (chord symbols are notation) but you are trying to use MusicJSON as a performance format. This is conflicting information which also doesn't heed the separation of concerns principle. You also force in this way an interpretation of what could have been intended, because a chord can be performed in many different ways as you indicate, and most likely many more.

I do not agree with the chord being a semantic concept. The underlying semantic concept is a set of tones which sound simultaneously in one or more instruments and which can be named or symbolized in a various number of ways. The chord you mentioned (G∆b6) can also be described as an Eb#5#9/G or even Eb#5b10/G. The underlying tones however are the same, and these tones are what should be described as basis.

As I said before: don't trust music theory or anything that you learned in that respect. Forget scales in the static way music theory teaches, as it creates loads of confusion: the scale of C, D doric, E phrygic, F lydic, G mixolydic etc are all the same scale, just a different starting point. Also keep in mind that they are dynamic, and can and do change in certain ways without becoming a different scale necessarily. (You can wonder whether the scale as a concept is actually useful, I tend to think it creates too much confusion). A key signature does not tell anything about the key of a piece. It just tells you where the familiar system of C has been moved, and even that is not what it seems that it is. Also related to the modes and keys: without going too deep into it, try to imagine that major originated on F and as such is and works like the lydic mode (as in the real mode as it was used, not the static scale music theory made of it later) and minor originated on D and is and works like the doric mode. (This is why we have minor and major parallels). That you then have to ask yourself where all the required accidentals went, or where the accidentals that are used come from (such as what is being described as harmonic and melodic minor) shows how little of how it actually works is being described in music theory.

([pedantic] in modern music you can easily have both a Eb and F# at the key without being laughed out of the orchestra [/pedantic]).

Edit: make pedantic mode work ;-)

brianbreitsch commented 8 years ago

(Disclaimer: I have a technical background--limited music education/experience.) Wow--a lot of good discussion going on here! I have some ideas: please let me know if they're good or silly.

I am interested in a format powerful enough to represent a detailed score, but also to provide the information necessary for a machine to "play" a piece with relative ease. I also want arbitrary transitions in time signature. I think an approach (analogous to HTML/CSS) that uses positional entities alongside style descriptors could be powerful.

First let's handle ordering (timing, rhythm, etc): The way I see it, almost every element on a score is a positional entity--i.e. it happens at some unique time or in some unique order. Notes, rhythmic elements, rests, measure delineations, key changes (cleft signs), etc. Given a defined order and/or "start times", a renderer need only have minimal additional context (pretty much just time signature) to be able to approximately place elements along a score.

The key phrase is "start time". How to represent? Using absolute time (in, say, milliseconds) is a poor choice for obvious reasons. Some songs speed up or slow down. In this sense, timing/ordering needs to be scaffolded by the ordering of measures. I propose a floating point format where the whole number represents the measure and the fractional component is the fractional position in that measure. Duration could also be specified as fraction of a measure, but I haven't pondered that as much yet. (What about floating point precision? As long as enough precision is there, just render in the context of some quantization interval.)

Another fundamental to tackle (for notes) is pitch. I think a good format will have a preference for absolute pitch (in some sense) over relatively ambiguous ideas like musical note ("A", "Bb", "C", etc) and octave. In western music, the note and octave can be derived from the context--i.e. the cleft, key signature, and relative tuning. In other styles, I don't know--but at least you have the absolute pitch.

Again, the important part is how to represent pitch. Again, I suggest a floating point format, close to that of MIDI, that uses the 12-tone scale for the "whole-number anchor points" and the fractional component for fractional adjustments. @mauritslamers I understand and appreciate the need to avoid formats that favor Western music (which I think in this case just means 12-tone per octave scale?). One alternative to what I just mentioned is pitch in the "frequency" sense (i.e. "A" as 440). My fear is that this becomes too much a computational burden on the rendering side. Just like timing using measures, pitch will require some "scaffolding"--let me know if you have a better scaffolding solution.

stephband commented 8 years ago

@brianbreitsch

First let's handle ordering (timing, rhythm, etc)...

Agreed, although I would say that rests are an absence of events, and that measure delineators (bar lines) are a function of time signature, so I don't think either of those 'positional entities' should necessarily be included in the data.

I propose a floating point format where the whole number represents the measure and the fractional component is the fractional position in that measure...

Currently MusicJSON works with a very similar system: whole numbers represent 'beats' and any fractional component is a fractional position inside that beat. I am against basing timing information on any notion of measures, as your system would mean that if I took a phrase in 3/4 and pasted it into a piece written in 4/4 the beats would no longer match up - you would get a 3 over 4 cross-rhythm effect. I would prefer it that a beat is always a beat is always a beat, so that this phrase still aligns with the beats of the 4/4 piece.

However, the notion of 'beat' is arbitrary, and you could use it exactly the way you describe: just consider a 'beat' to be one of your 'measures'.

Another fundamental to tackle (for notes) is pitch...

I am persuaded that note names are a necessary semantic, and one that also opens up the possibility of using non-western note names. But if note names are allowed you have to have some kind of map to map them to pitches.

The trouble with using pure frequency values is that they expose nothing about harmonic relationships. Some kind of logarithmic representation is a must. Note numbers do a decent job, and as you say, if you allow floating-point note numbers you can represent any pitch.

mauritslamers commented 8 years ago

@brianbreitsch Yes, there is a better scaffolding solution, which is a semantical description of music annotated with enough information that a player can play it and a notation program can create notation out of it. If you want to do a mixed format which can both do notation as well as playing, which at the same time is agnostic to the kind of music you are describing (which is what I mean with formats that avoid Western music), it is impossible to avoid a semantic description. This semantic description should come with a 'lookup table' which explains how to translate it to sound.

How strange it might sound, pitch is not fundamental to music. The idea of pitch as a fundamental aspect of music is typical for western music. What is important to music, and therefore needs to be represented somehow is change of pitch and the semantical meaning it can have. To give an example: in traditional Vietnamese music, music is written down as Chinese characters. Every character stands for a certain point in their scale, but depending on the mode the music is played in, it is preceded by what in western music would be called an ornament: an elaborate figure of fluent pitch changes which is a fixed figure for a specific point in the scale. Every point in the scale has its own figure, which depends on the mode the song is played in.

If pitch would be the basis of the format, this music would be almost impossible to describe. In MIDI it is possible to describe it by including an enormous amount of controller data, but it becomes really difficult to create notation out of it. By taking a semantical starting point, it not only becomes possible to describe the notation, but by way of including a prescription of how certain characters should be played, you also have a performance format.

brianbreitsch commented 8 years ago

@stephband

However, the notion of 'beat' is arbitrary, and you could use it exactly the way you describe: just consider a 'beat' to be one of your 'measures'.

Good point about copy/pasting--although a conversion would be something that could happen easily at the time of the copy/paste. I think "scaffolding" timing at "beats" would work well too. Overall, I think the fact that the note describes it's position absolutely is important. I will say though, I think there is a hefty advantage in using "measure" scaffolding for pieces (like those I'm interested in) that have many time signature changes. This becomes a computational burden when using beats, as the note's position in a measure must then be computed from a list of all previous measures. I think that having flexible copy/paste behavior to be able to create the cross-rhythm or line up quarter-notes/beats is less of a burden. Please let me know if you have a solution to the "many changing time signatures" problem, or if you think the advantages of beat-scaffolding outweigh.

@mauritslamers @stephband I see what you're saying about the semantic note naming now. I think a good design would still allow for notes to be expressed using numeric pitches as well as semantics. From my perspective, the roadblock is obtaining documentation for such semantics, and for my own applications, numeric notes would be enough.

(ASIDE: In terms of Western music, I've always had a deep loathing for using letters to represent absolute pitch--probably because can't hear absolute pitch, and I've always used numbers relative to some root to think about/describe my chord shapes and scales (I play guitar). But that's just a personal problem.)

In terms of file format, semantic naming should be allowed, but so should numeric naming, since that would be closer to the in-memory format for many music applications that would allow efficient transposing notes and key changes. Does what I'm saying sound correct?

mauritslamers commented 8 years ago

@brianbreitsch Many times time signature changes is fine in the format I am preparing together with a few others. I haven't been able to publish yet, but it might be better to do it as soon as possible, even if it is rough and not really in use yet. Our format also allows for meta information so that you can attach any extra information, such as measure numbers and/or positions.

The point about how to obtain the documentation about the semantics: if the writer of a file wants to be interoperable, he will have to include the translation information: a translation of semantic information to sound information. For the western music angle, I am writing a MIDI to our format.

stephband commented 8 years ago

@brianbreitsch

I think there is a hefty advantage in using "measure" scaffolding for pieces.

I think there is a hefty disadvantage - here is a phrase in 4/4:

[0,    "note", "C"]
[0.25, "note", "D"]
[0.5,  "note", "E"]

and here is the same phrase – the same phrase – in 3/4:

[0,          "note", "C"]
[0.33333333, "note", "D"]
[0.66666667, "note", "E"]

you see the point. You can't easily test for equality between musical phrases. You can't displace phrases from one bar to another without recalculating all the times, and that calculation has to have some knowledge of position within a song in order to know what the new numbers should be. So you can't use a single instance of a phrase in multiple places in a song, because phrases become position-within-a-song dependent.

In fact what you have done is to shift the burden of time signatures to the phrase level: bars no longer have time signatures, because that information is encoded into the times of events in your phrase. I might have a bar labelled 3/4, but if I stick a sequence with four even notes in that bar it will play four notes in the space of that bar. The bar is not in 3/4 at all, so The 3/4 label becomes meaningless.

I'm not sure what the "many changing time signatures" problem is. Where you might do:

[0,    "timesig", "4/4"]
[0,    "note", "C"]
[0.25, "note", "D"]
[0.5,  "note", "E"]
[0.75, "note", "F"]
[1.0,  "timesig", "3/4"]
[1.0,  "note", "G"]
[1.333333,  "note", "A"]

I would do:

[0, "timesig", "4/4"]
[0, "note", "C"]
[1, "note", "D"]
[2, "note", "E"]
[3, "note", "F"]
[4, "timesig", "3/4"]
[4, "note", "G"]
[5, "note", "A"]

In your version, if I removed the [1.0, "timesig", "3/4"] event, the last note would suddenly sound out of place.

In my version I could remove the [4, "timesig", "3/4"] without any change to the sound of the phrase. "timesig" is simply a hint for a visual renderer. Surely that's a simpler system?

semantic naming should be allowed, but so should numeric naming

Ugly as it is, I suspect that is the pragmatic way forward here.

stephband commented 8 years ago

@brianbreitsch

I should point out that nothing about MusicJSON stops you from structuring your data into bars, it just doesn't require it. Here's that example again, but structured into bars using the "sequence" event:

[0, "sequence", [
    [0, "timesig", "4/4"],
    [0, "note", "C"],
    [1, "note", "D"],
    [2, "note", "E"],
    [3, "note", "F"]
]],
[4, "sequence", [
    [0, "timesig", "3/4"],
    [0, "note", "G"],
    [1, "note", "A"]
]]

Is that what you mean by "scaffold"?

brianbreitsch commented 8 years ago

@stephband

You can't easily test for equality between musical phrases. You can't displace phrases from one bar to another without recalculating all the times, and that calculation has to have some knowledge of position within a song in order to know what the new numbers should be. So you can't use a single instance of a phrase in multiple places in a song, because phrases become position-within-a-song dependent.

This is all true. And I see what you're saying. This example cleared it up:

[0, "timesig", "4/4"]
[0, "note", "C"]
[1, "note", "D"]
[2, "note", "E"]
[3, "note", "F"]
[4, "timesig", "3/4"]
[4, "note", "G"]
[5, "note", "A"]

I was thinking about being many measures into an arrangement and having to figure out which beat of a measure a note was on. In this case, you still have to figure out when the last time signature change was, but as long as the timesig event has the correct "time" attribute, it will just be a matter of finding the last timesig event that occurred.

brianbreitsch commented 8 years ago

@stephband Ah--I didn't see that a sequence could be an event the first time I read the spec. Thanks for the example! It makes sense now.

danigb commented 8 years ago

Hi,

thanks @brianbreitsch to continue the discussion. And I agree @stephband that time in beats seems to me more convenient.

You mention the HTML/CSS metaphor (that is engaging) but I think we tend to talk about the HTML part (time, positional entities, if I understood well) and not the CSS part. What you mean exactly with the CSS part?, are you talking about visual representation of the events?, or maybe about possible transformations of them? Are you thinking that this CSS part should be inside this specification? That's my open questions, and maybe yours too...

@mauritslamers I think this is the first time I understand what you mean with "semantical description of music" but it's still too abstract for me. As far I can get, there's also two sources of information: the semantic description and the lookup table that allows to translate that semantics into ... what? Also I understand semantic description is not only limited to pitches, but maybe I'm wrong. Do you have any example (even if is not a real one) about how to represent music with this method in a kind of json file?

stephband commented 8 years ago

@brianbreitsch

Ah--I didn't see that a sequence could be an event the first time I read the spec.

Yerrrssee... that wasn't clear at all in the spec. I just edited it to allow sequence objects in sequence events.

In fact I have been misleading. What I wrote was a sort of pseudo-example. The MusicJSON spec says a sequence should be an object. So that example in full should look like:

{
    "name": "Major Impact",
    "events": [
        [0, "sequence", {
            "name": "Bar 1",
            "events": [
                [0, "timesig", "4/4"],
                [0, "note", "C"],
                [1, "note", "D"],
                [2, "note", "E"],
                [3, "note", "F"]
            ]
        }],
        [4, "sequence", {
            "name": "Bar 2",
            "events": [
                [0, "timesig", "3/4"],
                [0, "note", "G"],
                [1, "note", "A"]
            ]
        }]
    ]
}

Note also that sequences can be referred to by name, and so they can be self-referential. Here's a sequence that triggers itself every 4 beats:

{
    "name": "Major Impact",
    "events": [
        [0, "sequence", {
            "name": "Bar 1",
            "events": [
                [0, "timesig", "4/4"],
                [0, "note", "C"],
                [1, "note", "D"],
                [2, "note", "E"],
                [3, "note", "F"]
            ]
        }],
        [4, "sequence", "Major Impact"]
    ]
}

I wouldn't bank on implementations to be able to play that forever...

[edit: added note about sequence references]

stephband commented 8 years ago

What you mean exactly with the CSS part?

Yeah, that caught me, too. One could imagine a selector syntax for styling notation.

sequence > note:beat(0) {
    color: red,
    margin-right: 0.125em
}

bar:beat(8) {
    line: "double"
}

rodfer commented 4 years ago

I'm wondering if this project is being pursued. I came across it because I've been using MusicXML and MIDI files for a music app and I've always thought Json would be better, more compact and make it much easier to parse scores. Please, let me know if this project is still alive, because I'm definitely interested. I have a software, engineering and musical background, just in case it matters. Kind regards

stephband commented 4 years ago

Yeah, it absolutely is. And it has evolved a bit as I built Soundstage (https://github.com/soundio/soundstage – which I'm in the process of documenting ready for a release). This project is overdue an update to reflect that.

I do use it but I am a little puzzled that it hasn't seen much traction! I would love to have someone else run with it for a bit. Because I agree, it would be great to have a JSON format could be used by many libraries. And it would push me to advance what I'm doing too.

I have not looked at scoring for a couple of years. When CSS grids came out it occurred to me they are near-ideal. My last prototype used named grid-rows (C1, C#1, D1, ...) and named grid-columns (beat-1, beat-2, beat-3, ...) to score and it worked really well. Are you working with web tech for your app?

I have a music, acoustics, and web dev background.

stephband commented 4 years ago

I just edited the proposal to bring it up to date. ("name" properties are now "label", there is no more "pitch" event, some things made a little clearer)

felixroos commented 4 years ago

Hey, nice to see some activity here. I thought this was kind of outdated, but then I received a notification! First of all thanks for this thread, as it sparked my inspiration while creating rhythmical. I have written a pretty long document that uses the musicJSON proposal as a starting point and describes my idea-finding process. I did not intend to share it, but seeing some activity here, it might be of interest. Depending on where you live, you currently might have some time to read it... (i live in france which is curfewed). Would love to get some feedback on this!

stephband commented 4 years ago

Ben, re-salut!

Nice work on the Rhythmical demo! It's interesting, what you have done. I will need to read the document again to fully understand, but I got the gist of it. I did struggle with human-readability myself when thinking this problem through. However I do find that this:

    "///F4 F4",
    "||Ab4/Bb4 |Bb4_2.5",
    "|/Ab4 |Bb4/C5 |Eb4/F4",
    "",
    "///F4 F4",
    "||Ab4/Bb4 |Bb4_2.5",
    "|/Ab4 |Bb4/Cb5 |Eb4/F4",

is an odd mix of JSON and a domain-specific langauge that has to be parsed anyway. If you're inventing syntax, why not go all-in? Do you really need the JSON? Bad example, but you could find a way to write human-readable syntax into a custom 'player' element and...

Do you intend to handle performance data in this format? If I were to play something in on a MIDI keyboard, how would that end up being interpreted?

I'm in Lausanne. I thought we lived in a police state, but you guys really have it tough!

felixroos commented 4 years ago

Hey, that was a quick reply!

I must admit the snippet you picked looks odd. This part was mainly an experimentation with alternative tidalesque "footmark" symbols (search "marking out feet"). At the bottom of Cataloupe Island, you can see the custom "symbols" config. This is still undocumented (WIP alert). With non custom symbols, which I prefer, the string format looks more like this (pastable here):

{
  "duration": 12,
  "m": [
    "A5 . F5*2 C5 . D5*2 F5 . F5",
    "C5*2 F5 . F5*2 C6 . A5 . G5",
    "A5 . F5*2 C5 . D5*2 F5 . F5",
    "[C5*2 F5] [Bb5 A5 G5] F5*2",
    "A5 . F5*2 C5 . D5*2 F5 . F5",
    "C5*2 F5 . F5*2 C6 . A5 . G5"
  ]
}

But you speak a valid point with the domain specific language, the truth is, i am currently going "all in", writing a parser with nearley.js that will also be able to parse that (and much more):

A5 . F5*2 C5 . D5*2 F5 . F5 | 
C5*2 F5 . F5*2 C6 . A5 . G5 |
A5 . F5*2 C5 . D5*2 F5 . F5 |
C5*2 F5 . Bb5 A5 G5 . F5*2  |
A5 . F5*2 C5 . D5*2 F5 . F5 |
C5*2 F5 . F5*2 C6 . A5 . G5 |

I write the parser just for the sport, as someone else aready wrote a tidal syntax parser with PEG.js. If i fail, I will just use that 😀

The current parsing uses just regexp + recursive magic with which I still didnt manage to parse certain combinations of brackets, stars and dots/pipes combined.

But on the other hand, I find that JSON is still a good format to handle the more zoomed out parts, like multiple voices or instrument attributes, which will look less readable in text format (also has no syntax highlighting). This is also the case for tidal, as it uses Haskell / SuperCollider around the strings. I also experimented using yml for writing which works well and is less cluttered.

In the end, a domain specific language needs to be converted to javascript objects anyway...

I also like the hierarchical JSON format for its embedded structural information, which is currently used to draw containers around notes. This structural information could also be used to render a score. But maybe I will just use scribe with the flattened output 🤔 need to try that...

Currently, if you don't like the string/JSON mix at all, you can also use normal arrays (pastable here):

{
  "duration": 12,
  "m": [
    ["A5", ["F5*2", "C5"], ["D5*2", "F5"], "F5"],
    [["C5*2", "F5"], ["F5*2", "C6"], "A5", "G5"],
    ["A5", ["F5*2", "C5"], ["D5*2", "F5"], "F5"],
    [["C5*2", "F5"], ["Bb5", "A5", "G5"], "F5*2"],
    ["A5", ["F5*2", "C5"], ["D5*2", "F5"], "F5"],
    [["C5*2", "F5"], ["F5*2", "C6"], "A5", "G5"]
  ]
}

The only thing that needs to be parsed here is the stars but that is not part of any DSL magic (just split).

You could even opt out of the star syntax and write everything in plain JSON, but this will be a lot more unreadable, made worse by the auto formatter (pastable here): made gist to keep this post small

Your idea with the custom element is also on my list of things to try, for example with react or web components. But still i want a flexible hackable lib that is more than a blackbox. I also plan to migrate / rewrite features from this playalong lib to rhythmical, like voicings + voice leading, combined with grooves...

Also using performance data is on my list, using this lib to e.g. generate voicings below pressed single notes.

Using rhythmical, I see handling of performance input either using "groove presets" that play things you press in an editable groove. Or some way of recording into the format but that seems rather tricky.

But now I have a lot of time on my hands and no FOMO when hacking like a madman 🤪

stephband commented 4 years ago

"But maybe I will just use scribe with the flattened output 🤔 need to try that..."

I should point out that Scribe has not seen any love in years. As I mentioned to @rodfer above, my latest prototype printed notation to a CSS grid. I was intending to rewrite Scribe to do that, which would remove the need for it to include a full layout engine, a significant challenge. I only solved it for monophonic lines in Scribe so far, but CSS grid makes polyphonic layout much easier.

If I were to tackle that perhaps a pluggable interpreter might be useful, such that it could be made to consume any format.

rodfer commented 4 years ago

Hi guys, thanks for the quick replies.

I went through the documentation and through stephband's code. Pretty impressive. These are my 2 cents. I was looking for a JSON-based standard to describe music: both the score and the sound. I'm familiar with MusicXml and MIDI having developed an Android App that shows scores coming from a server. This app can parse, using Java, both MusicXml and Midi files (that took a while but it's done). I'm busy now developing the server side and I needed a way to show the score on a browser and play it. And if I'm not mistaken that's where stephband's work comes in.

I see three sides to such a standard: 1) A JSON-based open format for scores, This could be done by leveraging the MusicXml standard but using JSON, after getting rid of some redundancies and all the verbiage. Scores will be much smaller and be parsed much faster

2) A way to represent scores on a browser

3) A way to create the sounds on the browser and generate the MIDI output.

I'm not sure if this approach makes sense and I'm not totally familiar with all the problems on the browser side. Please, correct me before I get carried away

BTW, I'm writing from Minneapolis and yes we are stuck at home. Mostly... Cheers!

felixroos commented 4 years ago

If you want to render musicXML on the browser, you could try opensheetmusicdisplay. I did a few experiments with it, and it seems pretty robust..

stephband commented 4 years ago

Hello.

Yes, our goals align to some degree - I'm aiming record live MIDI and UI events into the browser and I want to be able to exchange these recordings using a JSON format.

For the MIDI part I made the JS library 'MIDI' (https://github.com/stephband/midi). (I notice the documentation is borked, I know exactly why that is and I'll repair it as soon as and let you know). The MIDI receive part of that library is well tested and works well. I haven't done much sending of MIDI yet, beyond some basic tests. The library also includes a bunch of helper functions for converting MIDI values to floats and so on.

I'm not too interested in a format that ONLY does scores so I may not be much help here. I find MusicXML an abomination! But I would be really interested in discussing how we can extend this MusicJSON to better support scoring. I'd like to know what the main problems are. What currently CAN'T be done with this format? There are some problems that are easy to solve with the addition of a property here or there, such as fixing a treble clef. What is difficult? Symbol insertion? Forced line breaks? I admit, these are things I'm not keen on polluting performance data with!
There are not many solutions to this problem so I started to write Scribe, but didn't get it beyond a prototype. Have you looked at VexFlow? If I remember, the reason I wanted to write my own was trivial – I didn't like their font - and perhaps there was a licensing problem, but it looks like it's MIT licensed now. I would LOVE to tackle this problem again, but honestly that's not gonna happen soon.
That's why I built Soundstage. I am currently documenting it with a view to a proper release soon. It works, it accepts MusicJSON, there's a sampler and a tone synth and a bunch of effects built-in and it would be relatively straightforward to make it playback MIDI through the MIDI library (which it depends on, currently).

Other than that, the established library for generating sound in the browser is ToneJS, you might want to give that a try. Don't, though, come and help out over here :)

All my documentations currently look a bit borked, it's my stupid doc builder and dependencies, I'll fix them and let you know.

felixroos commented 4 years ago

Hey guys, I was experimenting with staff notation today and thought you might be interested: https://felixroos.github.io/blog/rhythmical-staff/

I am using react + vexflow to render the staff and rhythmical to write the notes. It's still pretty basic but I will probably implement this further. The source code for the rendering can be found here.

ghost commented 2 years ago

I like the spec a lot. I have been trying to find a way to compose music without a DSL (i.e. lilypond) for a while. I'm actually writing a small JS library to compose music, but I was not finding any good languages to port to; that was until I found this spec.

Also, I like how intuitive the event syntax is because, from a musical perspective, music is event-based; the spec is very analogous to Western sheet music, and the spec can even be used to tell many robotic arms when to strike piano keys and for how long to sustain them.