zdevito / ATen

ATen: A TENsor library for C++11
677 stars 125 forks source link

moving tensors back and forth between CPU and GPU? #207

Open sflc6 opened 6 years ago

sflc6 commented 6 years ago

Super sorry if this is obvious, but -- how do I copy a tensor from CPU -> GPU and vice versa? I've been looking through the documentation, and can't seem to find how to do this?

stites commented 6 years ago

There might be a better way to do this, but ATen compiles the following functions in THCTensorCopy: https://github.com/zdevito/ATen/blob/master/src/THC/generic/THCTensorCopy.h#L37

There might also be some convenience functions like how pytorch lets you tensor.cuda() and tensor.cpu()

c-hofer commented 6 years ago

I'm also running against a wall here. The api is a little confusing for me here. For me it looks like one should do sthg like

data int32_t[] = ...; // data contains not only zeros
t_cpu = CPU(kInt).tensorFromBlob(&data[0], {10, 10}); // here content is fine
t_gpu = t_cpu.toType(t_cpu.type().toBackend(kCUDA).toScalarType(kInt)); // contains just zero 

but this seems wrong as t_gpu contains just zeros after the operation.

@ezyang What am i missing? @ezyang @zdevito It would be really great if a how to could be added to the readme file of ATen since loading data and the moving to gpu is a very common workflow imho :)

many thanks chofer

ezyang commented 6 years ago

If you are running reasonably recent master, I think the following should work:

at::Tensor t_gpu = t_cpu.to(at::kCUDA);

We should make t_cpu.cuda() work though...

CC @goldsborough

c-hofer commented 6 years ago

Hi,

my HEAD is 372d1d67356f054db64bdfb4787871ecdbbcbe0b.

to is not yet implemented, so it seems.

However, it looks like to problem is the creation with fromBlob(...). If i create a Tensor differently I can move it between CPU and GPU by using the toBackend method of the Tensor class, eg. my_cpu_tensor.toBackend(Backend::CUDA); .

My workaround to bring externally allocated cpu data on the gpu in a tensor:

  1. create array data on cpu
  2. use cuda malloc, memcopy to bring it on the gpu
  3. create a tensor with fromBlob from the allocated data
  4. clone the tensor (in order not to mess with ATens memory management engine?)
  5. cudafree the allocated space.

So from my point of view it seems that there is a transportation issue of the memory from the wild to the ATen controlled regime. But its just a guess ;)

cheers c.hofer

soumith commented 6 years ago

@c-hofer look at https://github.com/zdevito/ATen/blob/31d00ab7fdf00c258b0fad5b1b05af77e92b64a9/aten/src/ATen/test/dlconvertor_test.cpp

You can use the DLPack format which is a cross-framework, well-specified and simple format that we support importing from: https://github.com/dmlc/dlpack/

c-hofer commented 6 years ago

Thx, that's a valuable hint :)

goldsborough commented 6 years ago

You can also clone on the CPU first and then move it to GPU, if that's feasible: CPU(kInt).tensorFromBlob(&data[0], {10, 10}).clone().toBackend(at::kCUDA). The to() functions landed 6 days ago and are on master here: https://github.com/zdevito/ATen/blob/master/aten/src/ATen/templates/Tensor.h#L90

c-hofer commented 6 years ago

thx, this is surely more elegant ... by the way, any plans when the new ATen api will be more or less stable?