Fix overhead in `VertexBuffer<B>::set`

This PR is all about enabling this diff in VertexBuffer<B>::set, which closes #75.

     pub fn set(&self, data: &[B::Gl]) {
-        // FIXME: Get rid of this conversion + allocation.
-        let data: Vec<_> = data.iter().map(AsStd140::as_std140).collect();
-
         self.raw.set(&data);
     }

Depending on the system, this can save milliseconds of time per frame when there are tens of thousands of vertices.

The solution in this PR is to make Block<Gl> implement bytemuck::Pod. It does this by providing the new types gl::{,I,U,B}Vec{2,3,4} and gl::Mat{2,3,4}. These serve as data that can be put into Block<Gl>; they are not intended to be used directly. As a result, we are also able to make the glam dependency of posh optional.

I'm okay with the solution in this PR. There is a nice symmetry between the types in gl and sl. The downside is that we now require users to do .into() on data that is put into Block<Gl> structs.

leod / posh

Fix overhead in `VertexBuffer<B>::set` #124