microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.5k stars 4.29k forks source link

"an ComputationNodeBasePtr of mismatching precision was passed" #3174

Open xiezhq-hermann opened 6 years ago

xiezhq-hermann commented 6 years ago

I'm converting my model to be trained in FP16, but after I resolve all the conflicts of float32 and fload16, the error message occurs:

Validating --> PackedIndex2711 = PackedIndex (GatherPacked2688, Where2708) : [200 x whereNodeDynamicAxis_conditionVar_ElementTimes2677_Output_0], [ x __noSequenceAxis5] -> [] FAILED
Traceback (most recent call last):
  File "script/train_pm.py", line 397, in <module>
    profiling = args['profile'])
  File "script/train_pm.py", line 199, in train
    trainer.train_minibatch(data)
  File "/home/xiezhq/anaconda3/lib/python3.6/site-packages/cntk/train/trainer.py", line 181, in train_minibatch
    arguments, device)
  File "/home/xiezhq/anaconda3/lib/python3.6/site-packages/cntk/cntk_py.py", line 3024, in train_minibatch_overload_for_minibatchdata
    return _cntk_py.Trainer_train_minibatch_overload_for_minibatchdata(self, *args)
ValueError: an ComputationNodeBasePtr of mismatching precision was passed

I mainly follow the standard ResNet example to build my model in FP16. I have refered another related issue but it seems quite different to my trouble, I didn't do anything operation on the computation nodes.

ke1337 commented 6 years ago

Can you share a simplified repro? There might be ops not implemented for FP16 yet.

xiezhq-hermann commented 6 years ago

Thanks for your reply! the code I use is mainly based on the BIDAF example It seems to be some components of the network are not implemented for FP16, but I don't know how to find it out. Thanks a lot!

xiezhq-hermann commented 6 years ago

Could you please give me any advice to find the problem?

ke1337 commented 6 years ago

Can you use the crosstalk version, and eval the watchpoints in that script to find out where the issue comes from? You can find some examples on how to use crosstalk at here.