vchoutas / smplify-x

Expressive Body Capture: 3D Hands, Face, and Body from a Single Image
https://smpl-x.is.tue.mpg.de/
Other
1.73k stars 335 forks source link

NaN loss value, stopping! --> File "/home/mona/research/code/smplify-x/smplifyx/fit_single_frame.py", line 366, in fit_single_frame tqdm.write('Camera initialization final loss {:.4f}'.format( TypeError: unsupported format string passed to NoneType.__format__ #131

Open monacv opened 3 years ago

monacv commented 3 years ago

After processing some images successfully, I got this error. Is there a fix for it?


Processing: ../../data/smplify-x/djrn_test_data/images/HICO_test2015_00000469.jpg
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
Camera initialization done after 0.7507
Camera initialization final loss 1283.9620
Stage 000 done after 2.0166 seconds                                                                                                
Stage 001 done after 1.8066 seconds                                                                                                
Stage 002 done after 1.7855 seconds                                                                                                
Stage 003 done after 6.2607 seconds                                                                                                
Stage 004 done after 6.4354 seconds                                                                                                
Stage: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:18<00:00,  3.66s/it]
Body fitting Orientation 0 done after 18.3112 seconds                                                                              
Body final loss val = 7074.32373                                                                                                   
Orientation: 100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [00:18<00:00, 18.31s/it]
Processing: ../../data/smplify-x/djrn_test_data/images/HICO_test2015_00000470.jpg
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
Camera initialization done after 0.9054
Camera initialization final loss 805.9324
Stage 000 done after 2.0088 seconds                                                                                                
Stage 001 done after 0.6831 seconds                                                                                                
Stage 002 done after 2.9054 seconds                                                                                                
Stage 003 done after 7.4240 seconds                                                                                                
Stage 004 done after 1.3566 seconds                                                                                                
Stage: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:14<00:00,  2.88s/it]
Body fitting Orientation 0 done after 14.3841 seconds                                                                              
Body final loss val = 1076.25085                                                                                                   
Orientation: 100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [00:14<00:00, 14.38s/it]
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
Camera initialization done after 0.8964
Camera initialization final loss 10146.6113
Stage 000 done after 2.3258 seconds                                                                                                
Stage 001 done after 1.5916 seconds                                                                                                
Stage 002 done after 3.7246 seconds                                                                                                
Stage 003 done after 13.3136 seconds                                                                                               
Stage 004 done after 3.2600 seconds                                                                                                
Stage: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:24<00:00,  4.84s/it]
Body fitting Orientation 0 done after 24.2220 seconds                                                                              
Body final loss val = 196141.98438                                                                                                 
Orientation: 100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [00:24<00:00, 24.22s/it]
Processing: ../../data/smplify-x/djrn_test_data/images/HICO_test2015_00000471.jpg
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
NaN loss value, stopping!
Camera initialization done after 1.2850
Traceback (most recent call last):
  File "smplifyx/main.py", line 272, in <module>
    main(**args)
  File "smplifyx/main.py", line 245, in main
    fit_single_frame(img, keypoints[[person_id]],
  File "/home/mona/research/code/smplify-x/smplifyx/fit_single_frame.py", line 366, in fit_single_frame
    tqdm.write('Camera initialization final loss {:.4f}'.format(
TypeError: unsupported format string passed to NoneType.__format__

https://github.com/DirtyHarryLYL/DJ-RN/issues/28

monacv commented 3 years ago

@vchoutas Do you know why?

So, I was very curious and decided to run the SMPLify-X only for one of the images that we got a NAN loss for and I was able to produce meshes using SMPLify-X.

So I am very confused why do I get a NAN error when it is in a folder along with other images

Screenshot from 2021-01-13 21-22-30 Screenshot from 2021-01-13 21-21-38 Screenshot from 2021-01-13 21-21-25 Screenshot from 2021-01-13 21-21-08 Screenshot from 2021-01-13 21-25-55

monacv commented 3 years ago

Here's another example: When this image was along with >6K images in a folder, I got a NAN error.

[2440:2429 0:2010] 09:34:34 Wed Jan 13 [mona@goku:pts/0 +1] ~/research/code/smplify-x
$ ./bad_image_fit2.sh 
Processing: ../../data/smplify-x/BAD_DATA_FOLDER2/images/HICO_test2015_00000470.jpg
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
/home/mona/research/code/smplify-x/smplifyx/optimizers/lbfgs_ls.py:238: UserWarning: This overload of add_ is deprecated:
    add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
    add_(Tensor other, *, Number alpha) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
  p.data.add_(step_size, update[offset:offset + numel].view_as(p.data))
Camera initialization done after 0.8730
Camera initialization final loss 732.2143
Stage 000 done after 1.7267 seconds                                                                                     
Stage 001 done after 1.2796 seconds                                                                                     
Stage 002 done after 2.4069 seconds                                                                                     
Stage 003 done after 9.2062 seconds                                                                                     
Stage 004 done after 1.0497 seconds                                                                                     
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:15<00:00,  3.13s/it]
Body fitting Orientation 0 done after 15.6752 seconds                                                                   
Body final loss val = 1005.60815                                                                                        
Orientation: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:15<00:00, 15.68s/it]
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
Camera initialization done after 0.9409
Camera initialization final loss 9928.3438
Stage 000 done after 2.7342 seconds                                                                                     
Stage 001 done after 1.5152 seconds                                                                                     
Stage 002 done after 2.9879 seconds                                                                                     
Stage 003 done after 25.6505 seconds                                                                                    
Stage 004 done after 3.0541 seconds                                                                                     
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:35<00:00,  7.19s/it]
Body fitting Orientation 0 done after 35.9483 seconds                                                                   
Body final loss val = 148578.62500                                                                                      
Orientation: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:35<00:00, 35.95s/it]
Processing the data took: 00 hours, 00 minutes, 57 seconds
22473/31772MB(smplifyx) 
[2440:2429 0:2011] 09:35:39 Wed Jan 13 [mona@goku:pts/0 +1] ~/research/code/smplify-x
$ cat bad_image_fit2.sh 
export CUDA_VISIBLE_DEVICES=0
python smplifyx/main.py --config cfg_files/fit_smplx.yaml --data_folder ../../data/smplify-x/BAD_DATA_FOLDER2 --output_folder ../../data/smplify-x/BAD_RESULTS2 --visualize="False" --model_folder ../../data/smplify-x/models_smplx_v1_1/models/smplx/SMPLX_NEUTRAL.npz --vposer_ckpt ../../data/smplify-x/vposer_v1_0 --part_segm_fn ../../data/smplify-x/smplx_parts_segm.pkl

Screenshot from 2021-01-13 21-37-25 ![Screenshot from 2021-01-13 21-37-56](https://user-images.githubusercontent.com/76495162

Screenshot from 2021-01-13 21-38-58 Screenshot from 2021-01-13 21-40-06 Screenshot from 2021-01-13 21-39-50

monacv commented 3 years ago

Here's another of images which threw an error when ran in bulk but no error when ran in isolation:

$ cat bad_image_fit3.sh 
export CUDA_VISIBLE_DEVICES=0
python smplifyx/main.py --config cfg_files/fit_smplx.yaml --data_folder ../../data/smplify-x/BAD_DATA_FOLDER3 --output_folder ../../data/smplify-x/BAD_RESULTS3 --visualize="False" --model_folder ../../data/smplify-x/models_smplx_v1_1/models/smplx/SMPLX_NEUTRAL.npz --vposer_ckpt ../../data/smplify-x/vposer_v1_0 --part_segm_fn ../../data/smplify-x/smplx_parts_segm.pkl
$ ./bad_image_fit3.sh 
Processing: ../../data/smplify-x/BAD_DATA_FOLDER3/images/HICO_train2015_00000420.jpg
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
/home/mona/research/code/smplify-x/smplifyx/optimizers/lbfgs_ls.py:238: UserWarning: This overload of add_ is deprecated:
    add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
    add_(Tensor other, *, Number alpha) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
  p.data.add_(step_size, update[offset:offset + numel].view_as(p.data))
Camera initialization done after 0.8836
Camera initialization final loss 7888.6914
Stage 000 done after 2.7946 seconds                                                                                     
Stage 001 done after 2.0227 seconds                                                                                     
Stage 002 done after 2.7113 seconds                                                                                     
Stage 003 done after 13.2658 seconds                                                                                    
Stage 004 done after 25.5132 seconds                                                                                    
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:46<00:00,  9.26s/it]
Body fitting Orientation 0 done after 46.3134 seconds                                                                   
Body final loss val = 10309.36621                                                                                       
Orientation: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:46<00:00, 46.31s/it]
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
Camera initialization done after 1.0365
Camera initialization final loss 6034.3271
Stage 000 done after 3.4631 seconds                                                                                     
Stage 001 done after 0.9172 seconds                                                                                     
Stage 002 done after 1.3371 seconds                                                                                     
Stage 003 done after 11.0011 seconds                                                                                    
Stage 004 done after 15.4012 seconds                                                                                    
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:32<00:00,  6.43s/it]
Body fitting Orientation 0 done after 32.1260 seconds                                                                   
Body final loss val = 3179.14966                                                                                        
Orientation: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:32<00:00, 32.13s/it]
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
Camera initialization done after 1.4423
Camera initialization final loss 242.5790
Stage 000 done after 2.3764 seconds                                                                                     
Stage 001 done after 1.1786 seconds                                                                                     
Stage 002 done after 2.9422 seconds                                                                                     
Stage 003 done after 8.2292 seconds                                                                                     
Stage 004 done after 10.8499 seconds                                                                                    
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:25<00:00,  5.12s/it]
Body fitting Orientation 0 done after 25.5825 seconds                                                                   
Body final loss val = 4567.84961                                                                                        
Orientation: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:25<00:00, 25.58s/it]
Processing the data took: 00 hours, 01 minutes, 51 seconds

Screenshot from 2021-01-13 21-56-02 Screenshot from 2021-01-13 21-55-55 Screenshot from 2021-01-13 21-57-17 Screenshot from 2021-01-13 21-57-05 Screenshot from 2021-01-13 21-56-52

Screenshot from 2021-01-13 21-59-01

geopavlakos commented 3 years ago

If you check the first log, the image that generates the NaN error is HICO_test2015_00000471.jpg. The other images (e.g., HICO_test2015_00000470.jpg) run without problems even when you have all the images in the same folder. If HICO_test2015_00000471.jpg is the only "bad" image, you could remove it from the folder and try again without it.

monacv commented 3 years ago

@geopavlakos thanks a lot for your response.

For that specific image, I got this error on local machine: https://pastebin.com/raw/ezfy7Jwu
[2440:2429 0:2108] 10:16:17 Wed Jan 13 [mona@goku:pts/0 +1] ~/research/code/smplify-x
$ ./bad_image_fit4.sh
Processing: ../../data/smplify-x/BAD_DATA_FOLDER4/images/HICO_test2015_00000471.jpg
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
/home/mona/research/code/smplify-x/smplifyx/optimizers/lbfgs_ls.py:238: UserWarning: This overload of add_ is deprecated:
    add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
    add_(Tensor other, *, Number alpha) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
  p.data.add_(step_size, update[offset:offset + numel].view_as(p.data))
Camera initialization done after 1.1937
Camera initialization final loss 395.8522
Stage 000 done after 2.0787 seconds                                                                                     
Stage 001 done after 0.9554 seconds                                                                                     
Stage 002 done after 2.9617 seconds                                                                                     
Stage:  60%|██████████████████████████████████████████████▊                               | 3/5 [00:06<00:04,  2.03s/it]
Orientation:   0%|                                                                                | 0/2 [00:06<?, ?it/s]
Traceback (most recent call last):
  File "smplifyx/main.py", line 272, in <module>
    main(**args)
  File "smplifyx/main.py", line 245, in main
    fit_single_frame(img, keypoints[[person_id]],
  File "/home/mona/research/code/smplify-x/smplifyx/fit_single_frame.py", line 439, in fit_single_frame
    final_loss_val = monitor.run_fitting(
  File "/home/mona/research/code/smplify-x/smplifyx/fitting.py", line 175, in run_fitting
    loss = optimizer.step(closure)
  File "/home/mona/research/code/smplify-x/smplifyx/optimizers/lbfgs_ls.py", line 393, in step
    loss, flat_grad, t, ls_func_evals = _strong_Wolfe(obj_func, x_init, t, d,
  File "/home/mona/research/code/smplify-x/smplifyx/optimizers/lbfgs_ls.py", line 46, in _strong_Wolfe
    f_new, g_new = obj_func(x, t, d)
  File "/home/mona/research/code/smplify-x/smplifyx/optimizers/lbfgs_ls.py", line 392, in obj_func
    return self._directional_evaluate(closure, x, t, d)
  File "/home/mona/research/code/smplify-x/smplifyx/optimizers/lbfgs_ls.py", line 251, in _directional_evaluate
    loss = float(closure())
  File "/home/mona/research/code/smplify-x/smplifyx/fitting.py", line 246, in fitting_func
    total_loss = loss(body_model_output, camera=camera,
  File "/home/mona/venv/smplifyx/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/mona/research/code/smplify-x/smplifyx/fitting.py", line 434, in forward
    collision_idxs = self.search_tree(triangles)
  File "/home/mona/venv/smplifyx/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/mona/venv/smplifyx/lib/python3.8/site-packages/mesh_intersection/bvh_search_tree.py", line 56, in forward
    return BVHFunction.apply(triangles)
  File "/home/mona/venv/smplifyx/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
    return func(*args, **kwargs)
  File "/home/mona/venv/smplifyx/lib/python3.8/site-packages/mesh_intersection/bvh_search_tree.py", line 38, in forward
    outputs = bvh_cuda.forward(triangles,
MemoryError: std::bad_alloc: cudaErrorMemoryAllocation: out of memory
20711/31772MB(smplifyx) 

and I got no error when I ran it on server with 12G GPU memory. Screenshot from 2021-01-13 22-25-51

(smplifyx) mona@ubuntu:~/mona/code/smplify-x$ ./bad_image_fit4.sh
Processing: ../../data/smplify-x/BAD_DATA_FOLDER4/images/HICO_test2015_00000471.jpg
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
~/mona/code/smplify-x/smplifyx/optimizers/lbfgs_ls.py:238: UserWarning: This overload of add_ is deprecated:
    add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
    add_(Tensor other, *, Number alpha) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
  p.data.add_(step_size, update[offset:offset + numel].view_as(p.data))
Camera initialization done after 1.7885
Camera initialization final loss 395.8533
Stage 000 done after 3.6366 seconds                                                                                     
Stage 001 done after 1.6833 seconds                                                                                     
Stage 002 done after 6.2464 seconds                                                                                     
Stage 003 done after 6.9281 seconds                                                                                     
Stage 004 done after 12.8438 seconds                                                                                    
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:31<00:00,  6.27s/it]
Body fitting Orientation 0 done after 31.3518 seconds                                                                   
Body final loss val = 1509.08716                                                                                        
Orientation:  50%|████████████████████████████████████                                    | 1/2 [00:31<00:31, 31.35s/it]/home/mona/venv/smplifyx/lib/python3.6/site-packages/smplx/body_models.py:270: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  param[:] = torch.tensor(params_dict[param_name])
Stage 000 done after 4.3290 seconds                                                                                     
Stage 001 done after 1.2125 seconds                                                                                     
Stage 002 done after 5.2780 seconds                                                                                     
Stage 003 done after 7.2431 seconds                                                                                     
Stage 004 done after 5.2169 seconds                                                                                     
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:23<00:00,  4.66s/it]
Body fitting Orientation 1 done after 23.2944 seconds                                                                   
Body final loss val = 1509.74951                                                                                        
Orientation: 100%|████████████████████████████████████████████████████████████████████████| 2/2 [00:54<00:00, 27.32s/it]
Processing the data took: 00 hours, 01 minutes, 02 seconds

So if you look at this other image which I ended up with NAN in 12G GPU server, https://github.com/DirtyHarryLYL/DJ-RN/issues/28#issuecomment-759194536 when I run it on local machine I get no error (when I run in isolation).

Also, when I run the same exact image which throw an NAN error when ran along with a bunch of images on server, again in isolation in server, I get no error. I am baffled as to why running it along with other images is causing this problem. Could you please walk me through this?

(smplifyx) mona@ubuntu:~/mona/code/smplify-x$ ./bad_image_fit.sh
Processing: ../../data/smplify-x/BAD_DATA_FOLDER/images/HICO_test2015_00001357.jpg
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
~/mona/code/smplify-x/smplifyx/optimizers/lbfgs_ls.py:238: UserWarning: This overload of add_ is deprecated:
    add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
    add_(Tensor other, *, Number alpha) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
  p.data.add_(step_size, update[offset:offset + numel].view_as(p.data))
Camera initialization done after 2.0274
Camera initialization final loss 210.5556
Stage 000 done after 5.9732 seconds                                                                                     
Stage 001 done after 0.7499 seconds                                                                                     
Stage 002 done after 4.9734 seconds                                                                                     
Stage 003 done after 7.4580 seconds                                                                                     
Stage 004 done after 17.3789 seconds                                                                                    
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:36<00:00,  7.31s/it]
Body fitting Orientation 0 done after 36.5480 seconds                                                                   
Body final loss val = 1884.72766                                                                                        
Orientation: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:36<00:00, 36.55s/it]
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
Camera initialization done after 1.4444
Camera initialization final loss 179.2693
Stage 000 done after 3.3054 seconds                                                                                     
Stage 001 done after 1.9925 seconds                                                                                     
Stage 002 done after 5.2941 seconds                                                                                     
Stage 003 done after 7.1860 seconds                                                                                     
Stage 004 done after 3.8547 seconds                                                                                     
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:21<00:00,  4.33s/it]
Body fitting Orientation 0 done after 21.6465 seconds                                                                   
Body final loss val = 708.07715                                                                                         
Orientation: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:21<00:00, 21.65s/it]
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
Camera initialization done after 2.0496
Camera initialization final loss 176.3299
Stage 000 done after 3.2555 seconds                                                                                     
Stage 001 done after 3.2203 seconds                                                                                     
Stage 002 done after 4.7340 seconds                                                                                     
Stage 003 done after 5.5548 seconds                                                                                     
Stage 004 done after 2.5345 seconds                                                                                     
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:19<00:00,  3.86s/it]
Body fitting Orientation 0 done after 19.3133 seconds                                                                   
Body final loss val = 350.72900                                                                                         
Orientation: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:19<00:00, 19.31s/it]
Processing the data took: 00 hours, 01 minutes, 29 seconds
(smplifyx) mona@ubuntu:~/mona/data/smplify-x/BAD_DATA_FOLDER$ ls images/
total 168K
drwxrwxr-x 1 mona mona   30 Jan 13 19:31 ..
-rwxrwxr-x 1 mona mona 165K Jan 13 19:31 HICO_test2015_00001357.jpg
drwxrwxr-x 1 mona mona   52 Jan 13 19:31 .

and

(smplifyx) mona@ubuntu:~/mona/data/smplify-x/BAD_DATA_FOLDER4$ ls images/
total 76K
drwxrwxr-x 1 mona mona  30 Jan 13 19:22 ..
-rwxrwxr-x 1 mona mona 75K Jan 13 19:22 HICO_test2015_00000471.jpg
drwxrwxr-x 1 mona mona  52 Jan 13 19:22 .

Here's image HICO_test2015_00000471.jpg

Screenshot from 2021-01-13 22-43-26 Screenshot from 2021-01-13 22-44-03


Please note that I ran the problematic image once again in my local machine and it threw no error. This is very inconsistent. So do you have methods that would take care of these when I am trying to run it on like 30K images?

$ ./bad_image_fit4.sh
Processing: ../../data/smplify-x/BAD_DATA_FOLDER4/images/HICO_test2015_00000471.jpg
Found Trained Model: ../../data/smplify-x/vposer_v1_0/snapshots/TR00_E096.pt
/home/mona/research/code/smplify-x/smplifyx/optimizers/lbfgs_ls.py:238: UserWarning: This overload of add_ is deprecated:
    add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
    add_(Tensor other, *, Number alpha) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
  p.data.add_(step_size, update[offset:offset + numel].view_as(p.data))
Camera initialization done after 1.2542
Camera initialization final loss 395.8522
Stage 000 done after 2.2228 seconds                                                                                     
Stage 001 done after 0.9547 seconds                                                                                     
Stage 002 done after 3.0120 seconds                                                                                     
Stage 003 done after 7.5243 seconds                                                                                     
Stage 004 done after 9.8090 seconds                                                                                     
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:23<00:00,  4.71s/it]
Body fitting Orientation 0 done after 23.5286 seconds                                                                   
Body final loss val = 1509.15723                                                                                        
Orientation:  50%|████████████████████████████████████                                    | 1/2 [00:23<00:23, 23.53s/it]/home/mona/venv/smplifyx/lib/python3.8/site-packages/smplx/body_models.py:270: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  param[:] = torch.tensor(params_dict[param_name])
Stage 000 done after 2.2364 seconds                                                                                     
Stage 001 done after 1.2320 seconds                                                                                     
Stage 002 done after 4.3765 seconds                                                                                     
Stage 003 done after 8.0905 seconds                                                                                     
Stage 004 done after 9.9109 seconds                                                                                     
Stage: 100%|██████████████████████████████████████████████████████████████████████████████| 5/5 [00:25<00:00,  5.17s/it]
Body fitting Orientation 1 done after 25.8622 seconds                                                                   
Body final loss val = 1509.17957                                                                                        
Orientation: 100%|████████████████████████████████████████████████████████████████████████| 2/2 [00:49<00:00, 24.70s/it]
Processing the data took: 00 hours, 00 minutes, 54 seconds

Here's image HICO_test2015_00001357.jpg Screenshot from 2021-01-13 22-41-14

Here are its meshes produced on the server (also was able to do so on my local machine) in isolation: Screenshot from 2021-01-13 22-40-21 Screenshot from 2021-01-13 22-40-10 Screenshot from 2021-01-13 22-40-02

here's the resulting mesh Screenshot from 2021-01-13 22-48-22

overall, then, my question remains how can I run SMPLify-X on a large amount of images given this degree of inconsistency in results without getting bugged by NAN loss results?

Anirudh257 commented 3 years ago

@monacv The NaN result is mostly due to a bad Openpose prior. You should run Openpose first on these images with hand and face coordinates. If those are missing, Openpose won't give a complete 118 joint output for the full body and SMPL-X would fail.

Baalon commented 2 years ago

Hello,

First, thanks a Iot for your work, this is an impressive tool.

I do have a similar problem, I run smplify-x on frames extracted from a video, and I get a NaN loss value, stopping error seemingly randomly. I run smplify-x on CPU as it is much faster than GPU, as mentioned in https://github.com/vchoutas/smplify-x/issues/163.

The 3D meshes files end up with nan values for every line starting with v (the 6980 first lines of the file), and the fields ["Camera translation", "betas", "global_orient", "body_pose", "joints", "vertices"] are all NaNs on the pkl files. It affects every single frame processed after the first one that gets this error, which is surprising since, as far as I'm aware, smplify should process frames indepently from each other. Note: I had previously made local changes to not interrupt the processing for nan loss errors because of rare cases of openpose failure to detect a person on the image, so my smplify does not exit when the error is raised but continues to the next frame instead. I guess I'll have to change that, but thanks to this I know that the NaN loss value, stopping never propagated to the next frames when it occured due to bad openpose missing data before. [Edit] I realised my local changes interrupted the loop before it got to the point of calculating the camera loss, which might explain why it could not propagate the NaN values.

For one video, the error might pop up at frame ~400, while for another, at frame ~5000, etc. For others, there is no error. I did not notice any difference between the last properly processed frame and the first one that throws the error, apart from small movements of the individual. No major difference either on the openpose data (to address the previous comment on the thread). There is always only one person on the image, so it is a different situation from https://github.com/vchoutas/smplify-x/issues/142. Furthermore, If I do, as mentioned in an above message here, move the data to another folder then re-start the process from the frame that started getting NaN values, smplify can process it just fine, though it can fail again and throw another NaN loss value, stopping error at any time further down the frames.

I previously ran simplify-x on up to 29000 frames (on average ~3000), without errors. Using the same data sets, I started to get these errors when I changed the focal length from 5000 to 400. The reason for the change is that there is a high variability in camera optimisation settings between two subsequent frames (up to ~4 meters jumps for the estimated camera position). We reduced the focal to correspond to a more realistic value, which led to better stability of the estimated camera position on smaller image sets (~1000) we used for testing the change, without any NaN error occuring

It is weird that this would be the cause of the problem, since smplify runs properly with the same settings and input data if it is called again on the imaged that failed to be processed. I am currently running some sets with a focal of 1000 to compare just in case. I'll also test if the errors happen at the same frames every time when processing the same image set.

[Edited on 09/03/2022] Changing the focus to 1000 instead of 400 indeed removed the occurence of the NaN loss value for some videos, and not for others. The NaN loss value happens even for videos where the modified focus value seems to be adequate and to create a lot more accurate estimations. I guess the alternatives for me right now are either increasing the focus to avoid interruptions, or interrupt and restart smplify-x's processing at the frame that starts getting these NaN values.

whl-007 commented 2 weeks ago

smplify-x/smplx_parts_segm.pkl hi,can you tell me where can I find smplx_parts_segm.pkl?I can't find it.