paninski-lab / deepgraphpose

DeepGraphPose
GNU Lesser General Public License v3.0
32 stars 9 forks source link

"Tensor had NaN values" when encountering improbable label values #12

Open obarnstedt opened 3 years ago

obarnstedt commented 3 years ago

Hi and thanks for all the great work! I just wanted to point to a potential problem we've encountered during training of DGP. With our dataset, we could successfully run the first 50k iterations of "DGP on labeled frames only", but then for "Running DGP" encountered

tensorflow.python.framework.errors_impl.InvalidArgumentError: Found Inf or NaN global norm. : Tensor had NaN values
[[{{node VerifyFinite/CheckNumerics}}]]

This occurs in line 818 of fitdgp.py: [loss_eval, _] = sess.run([loss, train_op], feed_dict) After some debugging, I could trace the error to labeled frames in which the labels were accidentally set out of the normal range (DLC deletes markers set at x=0, y=0, but here they were accidentally at x=1, y=4; normally, labels were x/y>200). After removing these improbable labels, training continued normally. It's great that we now had a chance to clean our training dataset, but it would be better if there was a way for DGP to maybe just ignore such labels while giving a precise Warning message to alert the user. Otherwise, it's quite hard for the user to figure out where the actual problem is. Thanks, Oliver

waq1129 commented 3 years ago

Thanks for sharing this finding! This is very helpful to improve the package.