justingardner / mgl

A suite of mex/m files for displaying psychophysics stimuli
http://justingardner.net/mgl
Other
18 stars 22 forks source link

Allow mglMetal to negative-acknowledge command when there's an error. #41

Closed benjamin-heasly closed 2 years ago

benjamin-heasly commented 2 years ago

Currently the communication pattern between Matlab and the mglMetal app is pretty optimistic. It doesn't always recover from errors, and Matlab and mglMetal can get out of sync in terms of what to expect next.

I think we can get more robust if mglMetal adds the ability to send a short NAK / negative-acknowledge prefix, before it sends each next piece of data to Matlab. My thinking is, when mglMetal encounters an error it can:

On the Matlab side, we could

benjamin-heasly commented 2 years ago

This is implemented in 6c8fab05e26e70a76de81b72fc0ce96f0752766c.

I'll put a bunch of related info here, maybe some of this will find its way to the wiki when it feels stable.

Aside: thinking about error handling made me think about debugging. I moved mglMetal calls to "print" over to "os_log". This way the messages show up in the system logging tools. Now we can view the messages two ways:

I identified several errors that seem recoverable and don't need to crash the mglMetal app. mglMetal reports these back to Matlab with a "negative timestamp" instead of a regular timestamp. Matlab can check for these when necessary and warn/error/retry to fit the situation. The expectation is that after a negative ack, mglMetal will log an error message, abandon the current render pass (if any) and return to its waiting state, ready to start a new render pass.

Most of our commands are just executed for side-effects, like rendering. For these, the only data Matlab needs to read are two ack timestamps: one when the command was received, and one when the command was finished processing.

A few commands require Matlab to read additional data, like texture numbers, texture contents, window position, etc. For these return-value-commands, there's an extra ack timestamp. Before trying to read the return value, Matlab checks the "data incoming" timestamp. If it's positive, the command was processed OK and the data should be readable. If negative, Matlab should not try to read any return value. Doing so would make Matlab get stuck. Instead, Matlab should warn/error/retry depending on what makes sense in a given situation.

Since I spent time thinking about this, I wanted to write down these notes somewhere. I think this was a good change, but probably not the last word on the subject as we gain experience with the new communication pattern!