torch / cutorch

A CUDA backend for Torch7
Other
336 stars 208 forks source link

cutorch

NOTE on API changes and versioning

Cutorch provides a CUDA backend for torch7.

Cutorch provides the following:

torch.CudaTensor

This new tensor type behaves exactly like a torch.FloatTensor, but has a couple of extra functions of note:

Other CUDA tensor types

Most other (besides float) CPU torch tensor types now have a cutorch equivalent, with similar names:

Note: these are currently limited to copying/conversion, and several indexing and shaping operations (e.g. narrow, select, unfold, transpose).

CUDA memory allocation

Set the environment variable THC_CACHING_ALLOCATOR=1 to enable the caching CUDA memory allocator.

By default, cutorch calls cudaMalloc and cudaFree when CUDA tensors are allocated and freed. This is expensive because cudaFree synchronizes the CPU with the GPU. Setting THC_CACHING_ALLOCATOR=1 will cause cutorch to cache and re-use CUDA device and pinned memory allocations to avoid synchronizations.

With the caching memory allocator, device allocations and frees should logically be considered "usages" of the memory segment associated with streams, just like kernel launches. The programmer must insert the proper synchronization if memory segments are used from multiple streams.

cutorch.* API

Low-level streams functions (dont use this as a user, easy to shoot yourself in the foot):

Common Examples

Transfering a FloatTensor src to the GPU:

dest = src:cuda() -- dest is on the current GPU

Allocating a tensor on a given GPU: Allocate src on GPU 3

cutorch.setDevice(3)
src = torch.CudaTensor(100)

Copying a CUDA tensor from one GPU to another: Given a tensor called src on GPU 1, if you want to create it's clone on GPU 2, then:

cutorch.setDevice(2)
local dest = src:clone()

OR

local dest
cutorch.withDevice(2, function() dest = src:clone() end)

API changes and Versioning

Version 1.0 can be installed via: luarocks install cutorch 1.0-0 Compared to version 1.0, these are the following API changes:

operators 1.0 master
lt, le, gt, ge, eq, ne return type torch.CudaTensor torch.CudaByteTensor
min,max (2nd return value) torch.CudaTensor torch.CudaLongTensor
maskedFill, maskedCopy (mask input) torch.CudaTensor torch.CudaByteTensor
topk, sort (2nd return value) torch.CudaTensor torch.CudaLongTensor

Inconsistencies with CPU API

operators CPU CUDA