Benchmark GPU Processing Support

Originally I assumed that we would be keeping audio tensors on the GPU, so It made sense to pass tensors between audio transforms (consecutive transforms will be fused by beam and run in the same process). However since we've been doing most of our processing on the CPU this has not been necessary.

Now that I better understand beam concurrency it's worth taking a moment to revisit GPU support and just benchmark the cost of running some jobs on the GPU. I don't want to refactor away from torchaudio, and then refactor back later. If that seems likely, I would sooner add torch and torchaudio as optional package dependencies, and make LoadWithTorchaudio et al. contingent on these optional deps.

klay-music / klay-beam

Benchmark GPU Processing Support #67