newlandsvalley / elm-comidi

MIDI parser in pure elm
BSD 3-Clause "New" or "Revised" License
9 stars 3 forks source link

SysEx messages handled incorrectly #7

Open rhofour opened 6 years ago

rhofour commented 6 years ago

Right now sysex messages are treated as MetaEvents which are prefixed by 0xFF, but it looks like SysEx events don't require this prefix either in realtime or in midi files.

I believe this can be fixed by just moving parseSysEx from metaEvent to midiEvent, but I'm not terribly familiar with this stuff so I figured I'd check before I submitted a PR.

newlandsvalley commented 6 years ago

It's quite possible - I've never had cause to use them. I'll have another read of the spec.

newlandsvalley commented 6 years ago

You're quite correct and your analysis is good. I've created a sysex branch and corrected it (I hope) by making the parser closer to the spec - the new generalEvent function is now a choice of MIDI, meta or sysex events. I've only tested in so far as my sample MIDI files (that don't use SysEx) still parse. Perhaps you can test by round tripping and then we can merge it back in to master?

rhofour commented 6 years ago

I was thinking it could just be moved over to parseMidiEvent. It also seemed there were a couple other issues with it if I read the spec right. It seemed you were reading an int then reading that many bytes, but I believe the spec just says to look for 0xF0 and read 7-bit bytes (with the top bit set to zero) until you see 0xF7.

I implemented this in generate-sysex and now it matches my generator. Does this look good to you?

newlandsvalley commented 6 years ago

Do you have a version of the spec which is both normative and clear? I have a copy of _complete_midi96-1-3.pdf which I find almost impossible to use. But I think I took my implementation from here - https://www.csie.ntu.edu.tw/~r92092/ref/midi/#sysex_event but very possibly it's mistaken. The SysEx section here says it uses a counted sequence of bytes.

I've found this is the pdf spec mentioned above - page 135:

(sysex event) is used to specify a MIDI system exclusive message, either as one unit or in packets, or as an "escape" to specify any arbitrary bytes to be transmitted. A normal complete system exclusive message is stored in a MIDI File in this way:

F0 (length) (bytes to be transmitted after F0)

The length is stored as a variable-length quantity. It specifies the number of bytes which follow it, not including the F0 or the length itself. For instance, the transmitted message F0 43 12 00 07 F7 would be stored in a MIDI file as F0 05 43 12 00 07 F7. It is required to include the F7 at the end so that the reader of the MIDI file knows that it has read the entire message.

Another form of sysex event is provided which does not imply that an F0 should be transmitted. This may be used as an "escape" to provide for the transmission of things which would not otherwise be legal, including system realtime messages, song pointer or select, MIDI Time Code, etc. This uses the F7 code:

F7 (length) (all bytes to be transmitted)

rhofour commented 6 years ago

So, looking through the spec it appears SysEx messages are handled differently in MIDI files and in a MIDI event stream. Page 135 describes specifically the file case ["... stored in a MIDI File in this way:"].

On page 61 SysEx messages for the MIDI event stream are described:

EOX

Used as a flag to indicate the end of a System Exclusive transmission. A System Exclusive message starts with F0H and can continue for any number of bytes. The receiver will continue to wait for data until an EOX message (F7H) or any other non-Real Time status byte is received.

To avoid hanging a system, a transmitter should send a status byte immediately after the end of an Exclusive transmission so the receiver can return to normal operation. Although any Status Byte (except Real-Time) will end an exclusive message, an EOX should always be sent at the end of a System Exclusive message. Real time messages may be inserted between data bytes of an Exclusive message in order to maintain synchronization, and can not be used to terminate an exclusive message.

It looks like some of the parser code is going to need to be separate for MIDI events and MIDI files which will also mean I'll need to similarly separate my generating code for MIDI events and MIDI files. I don't think the differences are too significant though.

While I'm moving stuff around I might leave comments pointing to the relevant pages in the spec for the code. That it's easy to verify we're implementing the spec properly.

newlandsvalley commented 6 years ago

Interesting. I wonder how widespread these sorts of differences are? If there are a lot of them it may make sense to separate into two parsers. I'm going to try to spend a few days reading the spec very carefully.

newlandsvalley commented 6 years ago

OK - p 135 is quite clear:

For instance, the transmitted message F0 43 12 00 07 F7 would be stored in a MIDI file as F0 05 43 12 00 07 F7.

So you're right - the format of a SysEx message in a file is different from that in a live stream. And also, both variants are required to end in 0xF7. Why the spec says

It is required to include the F7 at the end so that the reader of the MIDI file knows that it has read the entire message

is unclear to me because it can count the bytes. But if this is the case, we can satisfy both contexts by ignoring the length byte and simply parsing until we hit 0xF7 as you suggest.

And how best to represent SysEx in MidiTypes if we were to do this? At the moment, we use a String to hold the data (because Elm has no binary types) which is presumably mistaken. What should we use - an Int Array where the first Int is a length in file contexts but absent in streaming contexts?

rhofour commented 6 years ago

I don't think we'd gain any benefit from including the length in Midi.Types. I'd suggest an int list without the length bit so it's consistent across contexts. The length isn't part of the SysEx data being transmitted, it's just a detail of the file format. A user can always just check the length of the list if they need it.

However, reading further from page 135 I see a case where we'll need the length for parsing. The spec describes multi-packet SysEx messages where the first packet starts with 0xF0, then the length, and data bytes, but no 0xF7. The next packets each start with 0xF7, but only the last one ends with it. For parsing these we'll need both the length, and to keep some context in-between packets of a single SysEx message.

To make things more complicated there's another type of SysEx message that can be stored in files.

Another form of sysex event is provided which does not imply that an F0 should be transmitted. This may be used as an "escape" to provide for the transmission of things which would not otherwise be legal, including system realtime messages, song pointer or select, MIDI Time Code, etc. This uses the F7 code:

When reading a MIDI File, and an F7 sysex event is encountered without a preceding F0 sysex event to start a multi-packet system exclusive message sequence, it should be presumed that the F7 event is being used as an "escape". In this case, it is not necessary that it end with an F7, unless it is desired that the F7 be transmitted.

Here we'll need to look at the length again too.

Finally, for the event stream parsing until we read 0xF7 isn't good enough to be compliant with the spec. If you look back to the EOX bit I quoted above (from page 61) it specifically says that any non-realtime status will also end the SysEx message. So we should parse until we see a non-realtime status bit, then end our SysEx message and either ignore the last byte (if it's 0xF7) or parse it otherwise.

It looks like we'll probably have to split the event stream and file parsers a bit, but hopefully we can share most of the code.

newlandsvalley commented 6 years ago

I agree we should now split the parsers. What I, myself, would like to do is to concentrate first on correcting the file parser and sort out this issue and also https://github.com/newlandsvalley/elm-comidi/issues/8. We could work on the stream parser independently and then later on see what code we can share.

I see your point about parsing the end of SysEx in a stream parser. But this is a big complication because it implies backtracking. A SysEx parser can recognize the next status bit, but unfortunately it also consumes it!

As for the F0/F7 variants, I think that the current (file) parser is syntactically correct, irrespective of the payload. Both require a counted array of bytes. I think the difference between them is that you are allowed to embed an F0 packet inside an F7 packet (but not vice versa). But the current parser wouldn't attempt to parse and embedded F0 - it just treats it as opaque data.

rhofour commented 6 years ago

I don't think ending the SysEx message with a status byte would be that difficult. We could parser it as:

0xF0, many bytes from 0 to 127, then an optional 0xF7 byte

Under the covers I imagine the optional but requires backtracking, but I don't think it complicates our code much.

However, the real problem is going to be parsing real-time status bytes while we're also parsing a SysEx message (see the EOX quote). For that we'll need context like with #8. I guess we'd already need it for multi-packet SysEx messages too.

These all see like good cases for some unit tests. I can add them later today or tomorrow.

On Dec 6, 2017 03:29, "John Watson" notifications@github.com wrote:

I agree we should now split the parsers. What I, myself, would like to do is to concentrate first on correcting the file parser and sort out this issue and also #8 https://github.com/newlandsvalley/elm-comidi/issues/8. We could work on the stream parser independently and then later on see what code we can share.

I see your point about parsing the end of SysEx. But this is a big complication because it implies backtracking. A SysEx parser can recognize the next status bit, but unfortunately it also consumes it!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/newlandsvalley/elm-comidi/issues/7#issuecomment-349569459, or mute the thread https://github.com/notifications/unsubscribe-auth/AAyIwnboCquzIDqlfls3DLCIH5-hGjuDks5s9lBbgaJpZM4Qzc7r .

rhofour commented 6 years ago

Looks like real-time status bytes aren't actually a problem for parsing midi events from WebMidi.

From the spec:

It is specifically noted that MIDI System Real-Time Messages may actually occur in the middle of other messages in the input stream; in this case, the System Real-Time messages will be dispatched as they occur, while the normal messages will be buffered until they are complete (and then dispatched).

Unless you disagree I think we should make the simplifying assumption that the MIDI events we're parsing come from WebMidi. This doesn't change anything about MIDI files though.

newlandsvalley commented 6 years ago

Very good news.

newlandsvalley commented 6 years ago

I'm wondering if we can satisfy the spec and cater for the range of possible downstream uses in the file parser if we change the type definition to something like:

type SysExFlavour = 
     F0        -- normal
   | F7        -- escaped

type MidiEvent =
  ...
  | SysEx SysExFlavour (List Int)
rhofour commented 6 years ago

For the F0 variant it seems like it serves two purposes: 1) To transmit things "which would not otherwise be legal, including system realtime messages, song pointer or select, MIDI Time Code, etc" 2) To allow multi-part SysEx messages

For 1 I wonder if the right thing to do might be to actually parse what's inside there. That way it's broken up into a list of midi messages rather than a single escaped SysEx message with arbitrary data inside. However, this would probably make things more complex for little gain.

I think there might be room to improve this later, but for now this is certainly much better than what we currently have. Do you have this written already or should I add it?

newlandsvalley commented 6 years ago

I must admit I'm still very confused about SysEx, and now I'm even more confused. The section that you quote under 1 above, in my reading of the spec (p 135), applies to escape sequences under the F7 variant, not the F0 variant. Incidentally, I have also been trying to make sense of SydEx using this resource - https://www.csie.ntu.edu.tw/~r92092/ref/midi/#sysex_event.

rhofour commented 6 years ago

Woops, that was a typo on my part. That should have read F7 variant.

newlandsvalley commented 6 years ago

OK - thank goodness. I think, though, that I'm still unclear. I would very much like to see a set of examples of F7 variant messages that occur in the wild. I'm yet to be convinced that we can discriminate between the two uses of F7 in order to be able to parse them correctly. Will it be possible, I wonder, to provide realistic unit test cases? If it would, I'd be a good deal more enthusiastic.

rhofour commented 6 years ago

I think it might only be possible to differentiate the two uses after the fact (by looking for bytes that wouldn't usually be allowed in a SysEx message). I'm also not totally if these are different uses.

For now I think your SysExFlavour idea is the best path forward. Do you have it written or do you want me to work on it?

newlandsvalley commented 6 years ago

OK, No, I have not yet written it. Can I leave it to you? I think perhaps it might be possible to revisit it later on to explore your ideas of parsing the contents.

rhofour commented 6 years ago

Yeah, that was my thought. This is a clear improvement over where things stand now and it's always possible to do things a bit better later. I should be able to get a PR with these changes out today, then I'll merge it into my generate branch which I've been working on again.