spoonsso / dannce

MIT License
214 stars 30 forks source link

COM prediction values are NaN #128

Closed yuan0821 closed 1 year ago

yuan0821 commented 2 years ago

Hi! I find the predicted com data are all NaN in the com3d0.mat. the parameters was set as below. Is there any problem in the video data loading? I use dannce demo video data, the result was fine, but using my own 30s data, the result is NaN. Could anyone do me a favor to check what the problems? Thank you so much!!!

`

(dannce113) F:\testdannce120\dannce\demo\new919>com-train .\com_config_919.yaml 2022-10-15 18:13:11.561307: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll downfac not found in io.yaml file, falling back to main config extension not found in io.yaml file, falling back to main config io_config not found in io.yaml file, falling back to main config crop_height not found in io.yaml file, falling back to main config crop_width not found in io.yaml file, falling back to main config n_channels_in not found in io.yaml file, falling back to main config camnames not found in io.yaml file, falling back to main config n_views not found in io.yaml file, falling back to main config n_channels_out not found in io.yaml file, falling back to main config batch_size not found in io.yaml file, falling back to main config sigma not found in io.yaml file, falling back to main config epochs not found in io.yaml file, falling back to main config verbose not found in io.yaml file, falling back to main config loss not found in io.yaml file, falling back to main config lr not found in io.yaml file, falling back to main config net not found in io.yaml file, falling back to main config vid_dir_flag not found in io.yaml file, falling back to main config metric not found in io.yaml file, falling back to main config num_validation_per_exp not found in io.yaml file, falling back to main config debug not found in io.yaml file, falling back to main config max_num_samples not found in io.yaml file, falling back to main config train_mode not found in io.yaml file, falling back to main config com_finetune_weights not found in io.yaml file, falling back to main config com_train_dir set to: .\COM\train_results\ com_predict_dir set to: .\COM\predict_results\ dannce_train_dir set to: .\DANNCE\train_results\AVG\ dannce_predict_dir set to: .\DANNCE\predict_results\ exp set to: [{'label3d_file': './20221015_173137_Label3D_dannce.mat'}] downfac set to: 4 extension set to: .avi io_config set to: io.yaml crop_height set to: [0, 1152] crop_width set to: [0, 1920] n_channels_in set to: 1 camnames set to: ['Camera1', 'Camera2', 'Camera3'] n_views set to: 3 n_channels_out set to: 1 batch_size set to: 2 sigma set to: 18 epochs set to: 10 verbose set to: 1 loss set to: mask_nan_keep_loss lr set to: 5e-5 net set to: unet2d_fullbn vid_dir_flag set to: False metric set to: mse num_validation_per_exp set to: 10 debug set to: False max_num_samples set to: 100 train_mode set to: finetune com_finetune_weights set to: ..\markerless_mouse_1\COM\weights\ base_config set to: .\com_config_919.yaml viddir set to: videos gpu_id set to: 0 immode set to: vid mono set to: False mirror set to: False num_train_per_exp set to: None augment_hue set to: False augment_brightness set to: False augment_hue_val set to: 0.05 augment_bright_val set to: 0.05 augment_rotation_val set to: 5 data_split_seed set to: None valid_exp set to: None dsmode set to: nn augment_shift set to: False augment_zoom set to: False augment_shear set to: False augment_rotation set to: False augment_shear_val set to: 5 augment_zoom_val set to: 0.05 augment_shift_val set to: 0.05 start_batch set to: 0 chunks set to: None lockfirst set to: None load_valid set to: None drop_landmark set to: None raw_im_h set to: None raw_im_w set to: None n_instances set to: 1 start_sample set to: 0 write_npy set to: None use_npy set to: False com_predict_weights set to: None com_debug set to: None com_exp set to: None Setting vid_dir_flag to True. Setting extension to .avi. Setting chunks to {'Camera1': array([0]), 'Camera2': array([0]), 'Camera3': array([0])}. Setting n_channels_in to 3. Setting raw_im_h to 2560. Setting raw_im_w to 2560. Experiment 0 using videos in .\videos Experiment 0 using camnames: ['Camera1', 'Camera2', 'Camera3'] {'0_Camera1': array([0]), '0_Camera2': array([0]), '0_Camera3': array([0])} ./20221015_173137_Label3D_dannce.mat Using nn downsampling TRAIN EXPTS: [0] Initializing Network... 2022-10-15 18:13:14.584720: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll 2022-10-15 18:13:14.646942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:41:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s 2022-10-15 18:13:14.654599: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll 2022-10-15 18:13:14.654802: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll 2022-10-15 18:13:14.655261: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll 2022-10-15 18:13:14.655673: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll 2022-10-15 18:13:14.718061: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll 2022-10-15 18:13:14.718307: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll 2022-10-15 18:13:14.719724: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll 2022-10-15 18:13:14.720154: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll 2022-10-15 18:13:14.720614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2022-10-15 18:13:14.732399: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-10-15 18:13:14.736139: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:41:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s 2022-10-15 18:13:14.736483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2022-10-15 18:13:17.018957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2022-10-15 18:13:17.022789: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2022-10-15 18:13:17.050138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N 2022-10-15 18:13:17.083234: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7433 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:41:00.0, compute capability: 8.6) E:\anaconda\envs\dannce113\lib\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py:375: UserWarning: The lr argument is deprecated, use learning_rate instead. "The lr argument is deprecated, use learning_rate instead.") COMPLETE

2022-10-15 18:13:18.002074: I tensorflow/core/profiler/lib/profiler_session.cc:126] Profiler session initializing. 2022-10-15 18:13:18.002236: I tensorflow/core/profiler/lib/profiler_session.cc:141] Profiler session started. 2022-10-15 18:13:18.002834: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1611] Profiler found 1 GPUs 2022-10-15 18:13:18.004376: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cupti64_112.dll'; dlerror: cupti64_112.dll not found 2022-10-15 18:13:18.005366: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cupti.dll'; dlerror: cupti.dll not found 2022-10-15 18:13:18.005806: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1661] function cuptiinterface->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI could not be loaded or symbol could not be found. 2022-10-15 18:13:18.005906: I tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session tear down. 2022-10-15 18:13:18.005966: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1752] function cuptiinterface->Finalize()failed with error CUPTI could not be loaded or symbol could not be found. Loading data Loading new video: .\videos\Camera1\0.avi for 0_Camera1 Loading new video: .\videos\Camera2\0.avi for 0_Camera2 Loading new video: .\videos\Camera3\0.avi for 0_Camera3 f:\testdannce120\dannce\dannce\engine\generator_aux.py:261: RuntimeWarning: invalid value encountered in true_divide y /= np.max(np.max(y, axis=1), axis=1)[:, np.newaxis, np.newaxis, :] Loading new video: .\videos\Camera1\0.avi for 0_Camera1 Loading new video: .\videos\Camera2\0.avi for 0_Camera2 Loading new video: .\videos\Camera3\0.avi for 0_Camera3 2022-10-15 18:13:34.113113: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2) Epoch 1/10 2022-10-15 18:13:36.067056: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll 2022-10-15 18:13:39.639960: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8302 2022-10-15 18:13:44.926827: E tensorflow/core/platform/windows/subprocess.cc:287] Call to CreateProcess failed. Error code: 2 2022-10-15 18:13:44.927055: W tensorflow/stream_executor/gpu/asm_compiler.cc:56] Couldn't invoke ptxas.exe --version 2022-10-15 18:13:44.931278: E tensorflow/core/platform/windows/subprocess.cc:287] Call to CreateProcess failed. Error code: 2 2022-10-15 18:13:44.941195: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Failed to launch ptxas Relying on driver to perform ptx compilation. Modify $PATH to customize ptxas location. This message will be only logged once. 2022-10-15 18:13:45.323821: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll 2022-10-15 18:13:45.324555: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll 1/1 [==============================] - 16s 16s/step - loss: 0.0000e+00 - val_loss: 0.0000e+00 Epoch 2/10 2022-10-15 18:13:50.695089: I tensorflow/core/profiler/lib/profiler_session.cc:126] Profiler session initializing. 2022-10-15 18:13:50.696455: I tensorflow/core/profiler/lib/profiler_session.cc:141] Profiler session started. 2022-10-15 18:13:50.697011: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1661] function cuptiinterface->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI could not be loaded or symbol could not be found. 1/1 [==============================] - ETA: 0s - loss: 0.0000e+002022-10-15 18:13:50.843437: I tensorflow/core/profiler/lib/profiler_session.cc:66] Profiler session collecting data. 2022-10-15 18:13:50.843756: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1752] function cuptiinterface->Finalize()failed with error CUPTI could not be loaded or symbol could not be found. 2022-10-15 18:13:50.893555: I tensorflow/core/profiler/internal/gpu/cupti_collector.cc:673] GpuTracer has collected 0 callback api events and 0 activity events. 2022-10-15 18:13:50.922415: I tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session tear down. 1/1 [==============================] - 1s 644ms/step - loss: 0.0000e+00 - val_loss: 0.0000e+00 Epoch 3/10 1/1 [==============================] - 0s 364ms/step - loss: 0.0000e+00 - val_loss: 0.0000e+00 Epoch 4/10 1/1 [==============================] - 0s 365ms/step - loss: 0.0000e+00 - val_loss: 0.0000e+00 Epoch 5/10 1/1 [==============================] - 0s 350ms/step - loss: 0.0000e+00 - val_loss: 0.0000e+00 Epoch 6/10 1/1 [==============================] - 0s 365ms/step - loss: 0.0000e+00 - val_loss: 0.0000e+00 Epoch 7/10 1/1 [==============================] - 0s 380ms/step - loss: 0.0000e+00 - val_loss: 0.0000e+00 Epoch 8/10 1/1 [==============================] - 0s 367ms/step - loss: 0.0000e+00 - val_loss: 0.0000e+00 Epoch 9/10 1/1 [==============================] - 0s 350ms/step - loss: 0.0000e+00 - val_loss: 0.0000e+00 Epoch 10/10 1/1 [==============================] - 0s 367ms/step - loss: 0.0000e+00 - val_loss: 0.0000e+00 Renaming weights file with best epoch description Saving full model at end of training `

yuan0821 commented 2 years ago

Below is my com-predict log.

(dannce113) F:\testdannce120\dannce\demo\new919>com-predict .\com_config_919.yaml 2022-10-15 18:14:07.426269: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll downfac not found in io.yaml file, falling back to main config extension not found in io.yaml file, falling back to main config io_config not found in io.yaml file, falling back to main config crop_height not found in io.yaml file, falling back to main config crop_width not found in io.yaml file, falling back to main config n_channels_in not found in io.yaml file, falling back to main config camnames not found in io.yaml file, falling back to main config n_views not found in io.yaml file, falling back to main config n_channels_out not found in io.yaml file, falling back to main config batch_size not found in io.yaml file, falling back to main config sigma not found in io.yaml file, falling back to main config epochs not found in io.yaml file, falling back to main config verbose not found in io.yaml file, falling back to main config loss not found in io.yaml file, falling back to main config lr not found in io.yaml file, falling back to main config net not found in io.yaml file, falling back to main config vid_dir_flag not found in io.yaml file, falling back to main config metric not found in io.yaml file, falling back to main config num_validation_per_exp not found in io.yaml file, falling back to main config debug not found in io.yaml file, falling back to main config max_num_samples not found in io.yaml file, falling back to main config train_mode not found in io.yaml file, falling back to main config com_finetune_weights not found in io.yaml file, falling back to main config com_train_dir set to: .\COM\train_results\ com_predict_dir set to: .\COM\predict_results\ dannce_train_dir set to: .\DANNCE\train_results\AVG\ dannce_predict_dir set to: .\DANNCE\predict_results\ exp set to: [{'label3d_file': './20221015_173137_Label3D_dannce.mat'}] downfac set to: 4 extension set to: .avi io_config set to: io.yaml crop_height set to: [0, 1152] crop_width set to: [0, 1920] n_channels_in set to: 1 camnames set to: ['Camera1', 'Camera2', 'Camera3'] n_views set to: 3 n_channels_out set to: 1 batch_size set to: 2 sigma set to: 18 epochs set to: 10 verbose set to: 1 loss set to: mask_nan_keep_loss lr set to: 5e-5 net set to: unet2d_fullbn vid_dir_flag set to: False metric set to: mse num_validation_per_exp set to: 10 debug set to: False max_num_samples set to: 100 train_mode set to: finetune com_finetune_weights set to: ..\markerless_mouse_1\COM\weights\ base_config set to: .\com_config_919.yaml viddir set to: videos gpu_id set to: 0 immode set to: vid mono set to: False mirror set to: False start_batch set to: 0 start_sample set to: 0 dsmode set to: nn com_predict_weights set to: None num_train_per_exp set to: None chunks set to: None lockfirst set to: None load_valid set to: None augment_hue set to: False augment_brightness set to: False augment_hue_val set to: 0.05 augment_bright_val set to: 0.05 augment_rotation_val set to: 5 drop_landmark set to: None raw_im_h set to: None raw_im_w set to: None n_instances set to: 1 write_npy set to: None use_npy set to: False data_split_seed set to: None valid_exp set to: None com_debug set to: None com_exp set to: None augment_rotation set to: False augment_shear set to: False augment_zoom set to: False augment_shift set to: False augment_shear_val set to: 5 augment_zoom_val set to: 0.05 augment_shift_val set to: 0.05 Setting vid_dir_flag to True. Setting extension to .avi. Setting chunks to {'Camera1': array([0]), 'Camera2': array([0]), 'Camera3': array([0])}. Setting n_channels_in to 3. Setting raw_im_h to 2560. Setting raw_im_w to 2560. Using the following *dannce.mat files: .\20221015_173137_Label3D_dannce.mat Using camnames: ['Camera1', 'Camera2', 'Camera3'] Initializing Network... 2022-10-15 18:14:10.325801: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll 2022-10-15 18:14:10.349421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:41:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s 2022-10-15 18:14:10.349690: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll 2022-10-15 18:14:10.352770: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll 2022-10-15 18:14:10.353266: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll 2022-10-15 18:14:10.354040: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll 2022-10-15 18:14:10.355596: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll 2022-10-15 18:14:10.355709: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll 2022-10-15 18:14:10.356146: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll 2022-10-15 18:14:10.356725: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll 2022-10-15 18:14:10.357199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2022-10-15 18:14:10.357985: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-10-15 18:14:10.361644: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:41:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s 2022-10-15 18:14:10.361924: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2022-10-15 18:14:10.754055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2022-10-15 18:14:10.754230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2022-10-15 18:14:10.755581: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N 2022-10-15 18:14:10.756272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7433 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:41:00.0, compute capability: 8.6) E:\anaconda\envs\dannce113\lib\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py:375: UserWarning: Thelrargument is deprecated, uselearning_rateinstead. "Thelrargument is deprecated, uselearning_rate` instead.") Loading weights from .\COM\train_results\weights.0-0.00000.hdf5 COMPLETE

Predicting on sample 0 Loading new video: videos\Camera1\0.avi for Camera1 Loading new video: videos\Camera2\0.avi for Camera2 Loading new video: videos\Camera3\0.avi for Camera3 2022-10-15 18:14:12.697986: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2) 2022-10-15 18:14:12.947341: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll 2022-10-15 18:14:13.646958: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8302 2022-10-15 18:14:14.571302: E tensorflow/core/platform/windows/subprocess.cc:287] Call to CreateProcess failed. Error code: 2 2022-10-15 18:14:14.571485: W tensorflow/stream_executor/gpu/asm_compiler.cc:56] Couldn't invoke ptxas.exe --version 2022-10-15 18:14:14.575616: E tensorflow/core/platform/windows/subprocess.cc:287] Call to CreateProcess failed. Error code: 2 2022-10-15 18:14:14.576118: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Failed to launch ptxas Relying on driver to perform ptx compilation. Modify $PATH to customize ptxas location. This message will be only logged once. 2022-10-15 18:14:14.605683: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll 2022-10-15 18:14:14.606093: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll Predicting on sample 1 Predicting on sample 2 Predicting on sample 3 Predicting on sample 4 Predicting on sample 5 Predicting on sample 6 Predicting on sample 7 Predicting on sample 8 Predicting on sample 9 Predicting on sample 10 Predicting on sample 11 Predicting on sample 12 Predicting on sample 13 Predicting on sample 14 Predicting on sample 15 Predicting on sample 16 Predicting on sample 17 Predicting on sample 18 Predicting on sample 19 Predicting on sample 20 Predicting on sample 21 Predicting on sample 22 Predicting on sample 23 Predicting on sample 24 Predicting on sample 25 Predicting on sample 26 Predicting on sample 27 Predicting on sample 28 Predicting on sample 29 Predicting on sample 30 Predicting on sample 31 Predicting on sample 32 Predicting on sample 33 Predicting on sample 34 Predicting on sample 35 Predicting on sample 36 Predicting on sample 37 Predicting on sample 38 Predicting on sample 39 Predicting on sample 40 Predicting on sample 41 Predicting on sample 42 Predicting on sample 43 Predicting on sample 44 Predicting on sample 45 Predicting on sample 46 Predicting on sample 47 Predicting on sample 48 Predicting on sample 49 Predicting on sample 50 Predicting on sample 51 Predicting on sample 52 Predicting on sample 53 Predicting on sample 54 Predicting on sample 55 Predicting on sample 56 Predicting on sample 57 Predicting on sample 58 Predicting on sample 59 Predicting on sample 60 Predicting on sample 61 Predicting on sample 62 Predicting on sample 63 Predicting on sample 64 Predicting on sample 65 Predicting on sample 66 Predicting on sample 67 Predicting on sample 68 Predicting on sample 69 Predicting on sample 70 Predicting on sample 71 Predicting on sample 72 Predicting on sample 73 Predicting on sample 74 Predicting on sample 75 Predicting on sample 76 Predicting on sample 77 Predicting on sample 78 Predicting on sample 79 Predicting on sample 80 Predicting on sample 81 Predicting on sample 82 Predicting on sample 83 Predicting on sample 84 Predicting on sample 85 Predicting on sample 86 Predicting on sample 87 Predicting on sample 88 Predicting on sample 89 Predicting on sample 90 Predicting on sample 91 Predicting on sample 92 Predicting on sample 93 Predicting on sample 94 Predicting on sample 95 Predicting on sample 96 Predicting on sample 97 Predicting on sample 98 Predicting on sample 99 using median to get 3D COM E:\anaconda\envs\dannce113\lib\site-packages\numpy\lib\nanfunctions.py:1114: RuntimeWarning: All-NaN slice encountered overwrite_input=overwrite_input) Saving 3D COM to .\COM\predict_results\com3d0.mat done!`

histun commented 1 year ago

Have you fixed this issue?

I've used 1) the provided weight (weights.250-0.00036.hdf5) as well as 2) a newly generated fintuned weight (using the provided pretrained weights.rat.COM.hdf5 and the dannce.mat), however, I always get NaN values for com-predict. (99 NaNs out of 100 predictions).

I thought it may have to do with my env setting, but 1) predicting dannce with the provided weight (weights.12000-0.00014.hdf5) and 2) finetuning/predicting (weights.rat.MAX.6cam.hdf5 with the dannce.mat) seem to work fine.

I'm puzzled with the problems I'm having with COM. I was wondering if anyone has any ideas.

histun commented 1 year ago

I fixed this issue by reinstalling dannce with the from the development branch, which had TF2.4 Since I have RTX ada, I installed cuda 11.8 and cudnn 8.7.0 from the nvidia website following their installation guide. After this, com-predict with the demo dataset and weights worked fine without NaN data.