ropensci / ozunconf18

repository for the rOpenSci ozunconference 2018
31 stars 7 forks source link

audio diagnostics for MCMC πŸ“‰ πŸ”‰ #21

Open goldingn opened 5 years ago

goldingn commented 5 years ago

background

MCMC is a method for estimating parameters in (Bayesian) statistical models. It results in a bunch of multivariate time series of parameter values, that are often plotted like this:

For your inferences to be correct, those chains need to have converged on the correct distribution of values. There are some summary statistics and visualisations you can use to assess this, but none of these alone is always sufficient, and they often tell you nothing about in what way the chains have not converged.

One of the reasons that's difficult is that there are usually several of these time-series (chains) and that each one describes moving around in a high-dimensional space (4 chains of a 10-parameters model means that each time step gives a 4 different points in a 10D space, and the trajectory of the points may well be correlated). So it's difficult to get a single optimal summary statistic, or to visualise them all at once.

idea

This morning I saw this tweet:

and subsequent link to this web page on diagnosing failing hard drives by their sounds.

Since MCMC chains are time-series and so are sounds, could we assess convergence of all of those parameters simultaneously by converting each of the parameters to different notes, converting them into sound waves, and playing them to the user?

If they were all happily converged, I would expect the sound to be a constant hum (@hollylkirk had the idea of making the notes for the parameters harmonize, so it sounds nice), whereas a lack of convergence would have a wavering/wobbling sound. Possibly something even weirder if there's a strange dependence to the parameters, like rotational invariance (pdf).

To diagnose which parameters are troublesome, we could create a dashboard to turn up or down the volume of each parameter.

tasks


Note: this idea may not be at all useful, but it'll be fun to find out!

goldingn commented 5 years ago

There's an experimental feature in greta that means we could even make this play whilst the model is sampling, which would be pretty cool!

njtierney commented 5 years ago

I love this idea!

I would love to hear about creating chains that are converged/not converged, as this is something I've struggled with in my work on whether people can visually identify convergence.

I've actually had a similar thought of including this idea of audio-convergence, but I really like how well fleshed the idea out. It also reminds me of some work in Human Factors, where things like blood pressure and other biometric recordings are played as increasing or decreasing in pitch as they change during surgery so that the surgeon can be operating and not need to read machine readings.

Keen to talk more about this πŸ’―

njtierney commented 5 years ago

This blog post here might be of interest: http://www.vikparuchuri.com/blog/making-instrumental-music-from-scratch/

Also these two packages might provide clues for how to read/write/play music in R:

goldingn commented 5 years ago

Aah sweet. I thought you'd have ideas on this topic! It looks like those two packages would be super useful.

Definitely happy to chat about unconverged chains, particularly the rotational invariance thing.

Another thought I just had: rather than having one audio channel per parameter (40 channels in the above example), it might make more sense to have one per chain (4 channels in the above) and map each chain from the high-dimensional parameter space to a lower-dimensional audio space (e.g. pitch x volume x timbre) via something like PCA. The influence of specific parameters could still be turned up or down, but it would probably be easier to hear the wobbliness like this.

certifiedwaif commented 5 years ago

I love this idea, and have some of the required skills. We'll see what's possible.

coolbutuseless commented 5 years ago

The following ridiculous idea has been kicking around my head for a while:

ggaudio(mtcars) + 
   geom_audio(aes(pitch = mpg, duration = cyl), volume = 10, side = 'L') + 
   geom_audio(aes(attack = am, sustain = am, decay = disp, release = mpg), volume = 3, side = 'R') + 
   facet_channel( ~ carname)

Don't ask me how it would work - I have absolutely no idea. :)

RPanczak commented 5 years ago

Oh yes ❗️ But you forgot to add + theme_daft_punk() πŸ˜‰

njtierney commented 5 years ago

@coolbutuseless that is genius! Although I have to agree, I have no idea how that would work.

jesse-jesse commented 5 years ago

I think this is a pretty fun idea. And potentially actually pretty useful. humans are pretty good at picking up different ces in sound.

noamross commented 5 years ago

I just discovered this issue and its amazing. Did anything come of this?

goldingn commented 5 years ago

Nope, we all did different stuff. Take it and run with it!

noamross commented 5 years ago

I'm going to write down my idea here because I am almost definitely not going to get around to it. Perhaps it will be picked up next year?

My thought is that an MCMC chain is a noisy signal of an underlying piece of information, we should use measures of its quality to transform a recognizable form of information - music - so that a good chain sounds like normal music and a bad chain sounds weird.

goldingn commented 5 years ago

Oh, that's a great idea! Should be quite achievable. Or at least it's much less nebulous than my previous thoughts :)