Currently when loading a model on cuda or metal, there is an additional copy taking place to get a CpuStorage from the original data slice and pass it to the device. This is not necessary but hard to get around with the current types so this PR adds a new CpuStorageRef type to get around this, the cuda implementation bypasses the additional copy.
Currently when loading a model on cuda or metal, there is an additional copy taking place to get a
CpuStorage
from the original data slice and pass it to the device. This is not necessary but hard to get around with the current types so this PR adds a newCpuStorageRef
type to get around this, the cuda implementation bypasses the additional copy.