Is there an expected difference between GPU and CPU training?

elliottd commented 9 years ago

Hi,

I ran the lstm_text_generation.py example on the CPU and GPU and reached different outcomes. Is this expected behaviour? I set numpy.random.seed(1234) at the top of the file before any Keras imports and ran both processes with THEANO_FLAGS=floatX=float32

Keras version 8e67b04
Python 2.7.5
Numpy 1.9.1
Scipy 0.15
Atlas 3.1.0
Theano 0.7.0

CPU: Intel Xeon E5-2650 GPU: NVidia Tesla K20X

GPU-mode output

$ THEANO_FLAGS=floatX=float32,device=gpu0 python lstm_text_generation.py Couldn't import dot_parser, loading of dot files will not be possible. Using gpu device 0: Tesla K20c corpus length: 600901 total chars: 59 nb sequences: 200294 Vectorization... Build model... /local/user/.local/lib/python2.7/site-packages/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility from scan_perform.scan_perform import *

Iteration 1 Epoch 0 200294/200294 [==============================] - 385s - loss: 2.8876

----- diversity: 0.2 ----- Generating with seed: "choose for company t" choose for company the the te the the the the tore and and ante the an ante the ans an the ale and and anthe the the and ande the the te an the the the tor an on the the the in the the tote the as in the te the the the the the ante the the ante the the ans the the the the the the te the perere to the cererte the the the cite ante the the the and on ante the tis an on the the the the the the the the te ante the te the

----- diversity: 0.5 ----- Generating with seed: "choose for company t" choose for company tar sose onthe pirte bos ol siteparint in int elasinann on atisalt and onsesele sutise nocasgesn, belan ande th a the ols anpite tha te asd os aretee ther anf os ereracitte in ane site oora nith al enteite ant arte poithe to thee ander os an osimesecins ir ereing antitint aute te po alion of to se toem and iol therisan iod te as te as irufe son in eseseline fomerine fis tore the thin te wote ane an

----- diversity: 1.0 ----- Generating with seed: "choose for company t" choose for company tisko, ota rloyd at iret fowar at ilegiorseces;ts wicel thot erdrimhec, afliacceyd anse g"taet, th ofla piiy san anpethir.pins bfans eelp"gringttounth chepmanl atltaflpold f irtithocpertecl" ile taidos "taly antislues,d"!:sly ssroscete gode fis ter ardeyn pfotet oo gel.r irebpiphicaagrid anth ef seu=in "e)in lh aldacthosies""n,, wd annterce,rtoccwan istineds orde aanlleft"ichams th ared taslemasnip

----- diversity: 1.2 ----- Generating with seed: "choose for company t" choose for company tiuthuen,t hed ioctpiuon in mermbitev; fevwifile thtad cosady, -hfrennnpasathy] infsintolg,s asnpd osr rtacikg tomwunet-cy nouagee, poce vieol thy lovhon, sivd coxmey tocfat, thumhuroit" menm8espescot th sivwwomdin,, buultiag isiase"dercausit.nl: her elasdoon wavtefdiyi, phonce poratye, y "ssild teereyn. c anefsogl.f fh os hor oxsl amd ifoo "ety ans if wfo ass nrarfis[arorg. olksu,",a d ilyuto-agh

CPU-mode output

$ THEANO_FLAGS=floatX=float32,device=cpu python lstm_text_generation.py Couldn't import dot_parser, loading of dot files will not be possible. corpus length: 600901 total chars: 59 nb sequences: 200294 Vectorization... Build model... /local/user/.local/lib/python2.7/site-packages/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility from scan_perform.scan_perform import *

Iteration 1 Epoch 0 200294/200294 [==============================] - 2579s - loss: 2.8999

----- diversity: 0.2 ----- Generating with seed: "choose for company t" choose for company the orecere the te the the te the the the an an the and an the the tor and the the and on the the on te the se the the the the ante the the the the an and and ans on te the ante the the the and and and the the and and and the the the the the the the the the the te the te the an in on the the ant and the te te the the the the te in the the the te the the the the the the the the an te the the the the

----- diversity: 0.5 ----- Generating with seed: "choose for company t" choose for company torin and there is an the an ood wand thes and ter ind alete an on thir feseran the ande site the areter ond artin and eot res ter on bes on and mon on andethe we mhe irete te toe, the he thee an om as inde temdend on the te andetisl the end teut th ale anm and tere tine panne there te af ol an he eres an an whint hees the and cinin te teve tering and tere the in te the shes bon mhme ther inlind th

----- diversity: 1.0 ----- Generating with seed: "choose for company t" choose for company thevii lhiatenu" toos nrvdatl oncherincuselysi-ien h isustshe ine hesos)a, -ond, anatu eedens inte .uthiwe,r the tirar sey howatin tis inn onethiid sedren tifd oncinavfso and thhade,,ed, insy an tivad aok cit hhe th tei gerk, tee avh ogase ,us tin hher abesy hf shs ba,soter bass bs gepfemeinhin be silconb tud te siunyu-atdutergictets of agyeen ure ongde tos hardetise elanels shessan-d ewdy asvtenr

----- diversity: 1.2 ----- Generating with seed: "choose for company t" choose for company thelmetes,-v:hipes if dosicancfdy-h,, es com(rmthe rte pthe tfoin cin d7a0 ucologer-an anufors ane, hhe ue ims srhmu. ac e4andicet inad ;, thobcevitam, bp mpoemss,s heewye:sd, wtoke ben"vood erasaco is es tembfcenthes unand cumte sua dtin gol talusidd udcevifl cocore thicsr cont merkimeed ir ur one mwereinn cutid eova olc aaf sot ueaelmtursior suiingrlenmarrruys,s--h andgbh egilddsee.ge(- fodilt hh

fchollet commented 9 years ago

Yes, there would be differences. The main difference is that CPU uses float64 data and GPU uses float32. You could run float32 computations on your CPU though (set the Thenao flat floatx=float32), but then again there can be remaining differences since all operations are being run by different pieces of code altogether, on physical devices that work quite differently.

Keras code can only be made deterministic on CPU or on GPU, but won't be deterministic across the two.

However your loss / accuracy metrics should be very close whatever the device. If you see significant differences there, that indicates a problem.

elliottd commented 9 years ago

@fchollet So the differences are due to how the CPU / GPU code is designed and executed? Do you know of any discussions about what causes this by the developers of Theano or libgpuarray?

wulabs commented 7 years ago

Across machines even with same architecture and same environments, I get different results as well. But on a single machine, with successive runs (either CPU or GPU) I am able to get the exact same results.

nouiz commented 7 years ago

Make sure you have the exact same version of all software. Just having a different BLAS (openBLAS vs mkl for example) can change slightly the results, as when small change like a + (b + c) != (a + b) + c on float value.

On Wed, Dec 7, 2016 at 8:43 PM, wulabs notifications@github.com wrote:

Across machines even with same architecture and same environments, I get different results as well. But on a single machine, with successive runs (either CPU or GPU) I am able to get the exact same results.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/681#issuecomment-265551808, or mute the thread https://github.com/notifications/unsubscribe-auth/AALC-yaDZhGWzvwz5-Vn-_oBR58v59UDks5rFwx-gaJpZM4F8cnG .

keras-team / keras

Is there an expected difference between GPU and CPU training? #681

GPU-mode output

CPU-mode output