Collections of many improvements

I took the liberty of developing the package further. These are a lot of changes and you might not want to merge them, but maybe there are some you like. Here is a brief list of what I did:

OpenCL context and command queue can be persisted, allowing to keep data between calls. The context also remembers whether to default to single- or double-precision for numeric vectors.
Data can stay on the OpenCL device (GPU) between kernel calls. This is extremely valuable when working with discrete GPUs connected over a relatively slow PCIe connection.
A single-precision data type is no longer required. The conversion takes place when transferring the data to the OpenCL device. On the R side, data remains in numeric vectors.
Kernels are executed asynchronously and possibly out-of-order, if the OpenCL implementation allows it. Synchronization need not to be done manually and happens without the user knowing: OpenCL events corresponding to a kernel execution are attached to the output buffer. Following kernel executions having the buffer as input then wait for the event, hence for the preceding kernel execution to finish. Likewise, reads from buffers wait on the attached event as well.
OpenCL device information is amended by maximum frequency. Also the list of extensions is broken down to make it easier searchable.
By default, we choose GPU devices. CPU devices usually don't make a lot of sense. Also, if there are multiple GPU devices available - think of a notebook with integrated and discrete GPU - we try to choose the faster device.
There are now several tests covering most of the functionality.

s-u / OpenCL

Collections of many improvements #3