Open bmigette opened 6 years ago
CUDA reports OOM when trying to allocate a big buffer that exceeds existing memory. Have you tried the same model in other backends? How much memory does it take if no OOM in those backends?
I've tried with tensorflow, it used approx 9Gb out of 11. Maybe the model I use is too big for GPU/CNTK then ?
There are some overheads in CNTK keras backend. Please try save the model in CNTK format and eval using CNTK function directly.
I'm not sure how to do that... In any case, I am fine using tensorflow for fitting, and either CNTK or tensorflow for predicting later... Note: I tried to save trained model from Keras using C.combine(model.model.outputs).save(name+".cntkmodel") But it gave me an error that it could not convert list to CNTK::Variable if I recall correctly.
Nope I'm getting same error... Should I change to CNTK backend before combining ?
>>> keras_model = load_model('60_p0_LSTM_256_128_128_128_128.h5')
>>> C.combine(keras_model.model.outputs).save('my_cntk_model')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\xxx\AppData\Local\Programs\Python\Python36\lib\site-packages\cntk\internal\swig_helper.py", line 69, in wrapper
result = f(*args, **kwds)
File "C:\Users\xxx\AppData\Local\Programs\Python\Python36\lib\site-packages\cntk\ops\__init__.py", line 82, in combine
return combine(operands_unfold, name)
TypeError: cannot convert list element to CNTK::Variable
Yes, load the keras model in CNTK backend please.
any update on this? I'm getting the same error with lstm model while tensorflow backend works fine
what I've found out is tensorflow is using shared GPU memory while cntk only uses dedicated GPU memory
I am trying to train a model using Keras and cntk2.4. Everytime I call the fit function, I get a cuda out of memory error.
My GPU has 11GB of RAM, and when it crashes, not even 1.5Gb are used.
My network is a simple single layer LSTM with 5000 neurons:
Full error: