microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.52k stars 4.28k forks source link

Artistic Style Transfer exsample always get "RuntimeError: CUDA failure 2: out of memory " #1204

Closed AllanYiin closed 7 years ago

AllanYiin commented 7 years ago

I try to use the great Artistic Style Transfer exsample , but every one when run at " z, intermediate_layers = model(y, layers)

then the system will throw out of memory exception. Is this caurse by load VGG-16 model? Vgg model shoud load at previous "print(f.attrs['nb_layers'])" step, and It work good?? Is anyway to prevent this sutuation.

RuntimeError Traceback (most recent call last)

in () 60 y = C.input_variable((3, SIZE, SIZE), needs_gradient=True) 61 ---> 62 z, intermediate_layers = model(y, layers) in model(x, layers) 23 for outer in range(1,6): 24 for inner in range(num_convs[outer]): ---> 25 x = vggblock(x, conv[cnt], model_layers, 'conv%d_%d' % (outer, 1+inner)) 26 cnt += 1 27 x = vggpool(x) in vggblock(x, arrays, layer_map, name) 3 f = arrays[0] 4 b = arrays[1] ----> 5 k = C.constant(value=f) 6 t = C.constant(value=np.reshape(b, (-1, 1, 1))) 7 y = C.relu(C.convolution(k, x, auto_padding=[True, True, False]) + t) C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py34\lib\site-packages\cntk\utils\swig_helper.py in wrapper(*args, **kwds) 56 @wraps(f) 57 def wrapper(*args, **kwds): ---> 58 result = f(*args, **kwds) 59 map_if_possible(result) 60 return result C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py34\lib\site-packages\cntk\ops\__init__.py in constant(value, shape, device, name) 2182 dtype = np.float32 2183 -> 2184 return Constant(value, shape, dtype, device, name) 2185 2186 ########################################################################## C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py34\lib\site-packages\cntk\ops\variables.py in __init__(self, value, shape, dtype, device, name) 215 super(Constant, self).__init__(utils.sanitize_shape(shape), sanitize_dtype_cntk(dtype), value) 216 else: --> 217 ndav = sanitize_value(shape, value, dtype, device) 218 super(Constant, self).__init__(ndav, name) 219 C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py34\lib\site-packages\cntk\utils\__init__.py in sanitize_value(shape, value, dtype, device) 289 value = np.asarray(value, dtype=np_dtype) 290 --> 291 ndav = _create_NDArrayView_from_NumPy(value, device) 292 293 return ndav C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py34\lib\site-packages\cntk\utils\__init__.py in _create_NDArrayView_from_NumPy(nd, device) 464 device = use_default_device() 465 --> 466 return cntk_py.NDArrayView(nd, device, False) 467 468 class Value(cntk_py.Value): C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py34\lib\site-packages\cntk\cntk_py.py in __init__(self, *args) 531 532 def __init__(self, *args): --> 533 this = _cntk_py.new_NDArrayView(*args) 534 try: 535 self.this.append(this) RuntimeError: CUDA failure 2: out of memory ; GPU=0 ; hostname=ALLAN-SURFACE ; expr=cudaMalloc((void**) &deviceBufferPtr, sizeof(AllocatedElemType) * numElements)
cha-zhang commented 7 years ago

What's your GPU?

AllanYiin commented 7 years ago

I use my surfacebook, equal to GTX950, 1GB GDDR5

n17s commented 7 years ago

The first load is from disk to RAM. The model() call loads from RAM to GPU. The VGG model is big but we only load the convolutional part which is about 50Mb, the problem comes from having to store a lot of the layers.

One possibility that might work for you is to reduce the SIZE variable. Try SIZE=200 (should use 56% less memory) or 100 (should use 89% less memory). Try to find the largest SIZE that fits in your card without failing because if the SIZE is small the details of the input will be lost.

Once you get it to work you can also modify the implementation so that it is less demanding on memory. I think removing all the intermediate layers from the loss function and from the outputs of the network should help.

There's also the possibility of running everything on CPU which you can do with

C.set_default_device(C.cpu()) 

as the first thing after importing cntk