The Gradient descent should just be brushed upon but essentially the reasoning should not be centered upon it. Useful though for explaining tied parameters.
At the moment, focus on distributed versions, however could make the point about robustness to noise even in very simple non-parallel environment. This can just be quoted (serial noisy updates following the same kind of iterations or maybe appendix with all variants considered... not sure this is hugely interesting but a graph with comparison to STAN would be useful).