I can't get good results #72

Closed bayunCC closed 5 months ago

bayunCC commented 5 months ago

Thanks for your great job.
I can't get good body model in using bash run-neuman-demo.sh. I got obscure images in InstantAvatar/outputs/neuman/baseline/seattle/animation/progression when performing fit.py like this images: 016399 I set epoch to 400 didn't improve the result. And I also got bad result after running train.py like this: 006969 What I got after processing data:

2024-04-19 12-12-58 的屏幕截图

bash ./bash/run-neuman-demo.sh QOpenGLContext::swapBuffers() called with non-exposed window, behavior is undefined QOpenGLContext::swapBuffers() called with non-exposed window, behavior is undefined Duration: 1.32s @ 120.05 FPS Global seed set to 42 Switch to /home/wh/project/InstantAvatar/outputs/neuman/baseline/seattle [train] No optimized smpl found. [val] No optimized smpl found. [test] No optimized smpl found.

| Name | Type | Params 0 | net_coarse | NeRFNGPNet | 13.0 M 1 | SMPL_param | SMPLParamEmbedding | 3.1 K 2 | loss_fn | NGPLoss | 14.7 M 3 | renderer | Raymarcher | 0
13.0 M Trainable params 14.7 M Non-trainable params 27.8 M Total params 111.022 Total estimated model params size (MB) /home/wh/anaconda3/envs/ins/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py:412: UserWarning: The number of training samples (41) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch. rank_zero_warn( Epoch 399: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 42/42 [00:09<00:00, 4.47it/s, v_num=0] Save optimized pose to /home/wh/project/InstantAvatar/data/custom/seattle/poses/train.npz
Global seed set to 42 Switch to /home/wh/project/InstantAvatar/outputs/neuman/baseline/seattle [train] Loading from /home/wh/project/InstantAvatar/data/custom/seattle/poses/train.npz [val] No optimized smpl found. [test] No optimized smpl found. [2024-04-19 10:34:55,128][torch.distributed.nn.jit.instantiator][INFO] - Created a temporary directory at /tmp/tmpps0wff_b [2024-04-19 10:34:55,128][torch.distributed.nn.jit.instantiator][INFO] - Writing /tmp/tmpps0wff_b/_remote_module_non_scriptable.py Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Loading model from: /home/wh/project/InstantAvatar/third_parties/lpips/weights/v0.1/vgg.pth GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs Saving configs. LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

| Name | Type | Params 0 | net_coarse | NeRFNGPNet | 13.0 M 1 | loss_fn | NGPLoss | 14.7 M 2 | renderer | Raymarcher | 0
13.0 M Trainable params 14.7 M Non-trainable params 27.8 M Total params 111.009 Total estimated model params size (MB) /home/wh/anaconda3/envs/ins/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:631: UserWarning: Checkpoint directory checkpoints/ exists and is not empty. rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.") /home/wh/anaconda3/envs/ins/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py:412: UserWarning: The number of training samples (41) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch. rank_zero_warn( Epoch 399: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 42/42 [00:24<00:00, 1.68it/s, v_num=0] Global seed set to 42
Switch to /home/wh/project/InstantAvatar/outputs/neuman/baseline/seattle [train] Loading from /home/wh/project/InstantAvatar/data/custom/seattle/poses/train.npz [val] No optimized smpl found. [test] No optimized smpl found. [2024-04-19 11:49:57,490][torch.distributed.nn.jit.instantiator][INFO] - Created a temporary directory at /tmp/tmp3n7mncvq [2024-04-19 11:49:57,491][torch.distributed.nn.jit.instantiator][INFO] - Writing /tmp/tmp3n7mncvq/_remote_module_non_scriptable.py Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Loading model from: /home/wh/project/InstantAvatar/third_parties/lpips/weights/v0.1/vgg.pth Resume from checkpoints/last.ckpt 60it [03:42, 3.71s/it] Global seed set to 42 Switch to /home/wh/project/InstantAvatar/outputs/neuman/baseline/seattle [train] Loading from /home/wh/project/InstantAvatar/data/custom/seattle/poses/train.npz [val] No optimized smpl found. [test] No optimized smpl found. [2024-04-19 11:53:49,076][torch.distributed.nn.jit.instantiator][INFO] - Created a temporary directory at /tmp/tmp8jecsoss [2024-04-19 11:53:49,076][torch.distributed.nn.jit.instantiator][INFO] - Writing /tmp/tmp8jecsoss/_remote_module_non_scriptable.py Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Loading model from: /home/wh/project/InstantAvatar/third_parties/lpips/weights/v0.1/vgg.pth Resume from checkpoints/last.ckpt 320it [11:41, 2.19s/it]

tijiang13 commented 5 months ago

Hi bayun,

I just reran the demo and here is what I got after a few epochs: image

Usually even if the pose is bad it would only impact the reconstruction quality to some extent. However, the result you showed appears to be complete divergence.

For troubleshooting, you can try overwriting the poses_optimized.npz and train.npz with data.zip and run bash ./bash/run-neuman-demo.sh again (remember to delete /home/wh/project/InstantAvatar/outputs/neuman/baseline/seattle to avoid confusion with previous runs)

Let me know if the problem persist.

Best, Tianjian

bayunCC commented 5 months ago

Hi Tianjian, Thanks for your quick reply.I replaced two files but it didn't work. I ran openpose to generated keypoints.npy of neuman in Windows System. Because I can't run openpose in Ubuntu successfully.Maybe I generated a wrong data.Could you please give me the data of neuman-demo to let me have a test and improve my data.

tijiang13 commented 5 months ago

Hi banyunCC,

In the pipeline, we utilize OpenPose to generate the poses_optimized.npz file, which I included in the zip file I sent to you. If it doesn't work with these NPZ files, it's unlikely to work with results from OpenPose either. In this case, I suspect there may be a compatibility issue with CUDA or PyTorch. Do you have access to a different machine?

Alternatively did you try the results in PeopleSnapshot?

Best, Tianjian

bayunCC commented 5 months ago

Hi Tijiang, The rusult of peoplesnapshot is good. 2024-04-20 08-38-00 的屏幕截图 I will try to rebuild my environment to have a test. My environment now is CUDA11.6+Python3.8.1+torch 1.13.1+cu116

conda list: Name Version Build Channel _libgcc_mutex 0.1 main defaults _openmp_mutex 5.1 1_gnu defaults absl-py 2.1.0 pypi_0 pypi pypi_0 pypi google-auth-oauthlib 1.0.0 pypi_0 pypi grpcio 1.62.1 pypi_0 pypi h5py 3.11.0 pypi_0 pypi hydra-core 1.1.2 pypi_0 pypi idna 3.7 pypi_0 pypi imageio 2.34.0 pypi_0 pypi imgui 2.0.0 pypi_0 pypi importlib-metadata 7.1.0 pypi_0 pypi importlib-resources 5.2.3 pypi_0 pypi iopath 0.1.10 pypi_0 pypi ipycanvas 0.13.1 pypi_0 pypi ipyevents 2.0.2 pypi_0 pypi ipython 8.12.3 pypi_0 pypi ipywidgets 8.1.2 pypi_0 pypi itsdangerous 2.1.2 pypi_0 pypi jedi 0.19.1 pypi_0 pypi jinja2 3.1.3 pypi_0 pypi joblib 1.4.0 pypi_0 pypi jupyter-client 7.4.9 pypi_0 pypi jupyter-core 5.7.2 pypi_0 pypi jupyterlab-widgets 3.0.10 pypi_0 pypi kaolin 0.15.0 pypi_0 pypi lap 0.4.0 pypi_0 pypi ld_impl_linux-64 2.38 h1181459_1 defaults libedit 3.1.20230828 h5eee18b_0 defaults libffi 3.2.1 hf484d3e_1007 defaults libgcc-ng 11.2.0 h1234567_1 defaults libgomp 11.2.0 h1234567_1 defaults libstdcxx-ng 11.2.0 h1234567_1 defaults lightning-utilities 0.11.2 pypi_0 pypi lpips 0.1.4 pypi_0 pypi markdown 3.6 pypi_0 pypi markupsafe 2.1.5 pypi_0 pypi marshmallow 3.21.1 pypi_0 pypi matplotlib-inline 0.1.6 pypi_0 pypi moderngl 5.10.0 pypi_0 pypi moderngl-window 2.4.6 pypi_0 pypi mpmath 1.3.0 pypi_0 pypi multidict 6.0.5 pypi_0 pypi multipledispatch 1.0.0 pypi_0 pypi mypy-extensions 1.0.0 pypi_0 pypi ncurses 6.4 h6a678d5_0 defaults nest-asyncio 1.6.0 pypi_0 pypi networkx 3.1 pypi_0 pypi ninja pypi_0 pypi numpy 1.23.1 pypi_0 pypi nvdiffrast 0.3.1 pypi_0 pypi nvidia-cublas-cu12 pypi_0 pypi nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi nvidia-cudnn-cu12 pypi_0 pypi nvidia-cufft-cu12 pypi_0 pypi nvidia-curand-cu12 pypi_0 pypi nvidia-cusolver-cu12 pypi_0 pypi nvidia-cusparse-cu12 pypi_0 pypi nvidia-nccl-cu12 2.19.3 pypi_0 pypi nvidia-nvjitlink-cu12 12.4.127 pypi_0 pypi nvidia-nvtx-cu12 12.1.105 pypi_0 pypi oauthlib 3.2.2 pypi_0 pypi omegaconf 2.1.2 pypi_0 pypi opencv-contrib-python-headless pypi_0 pypi opencv-python-headless pypi_0 pypi openssl 1.1.1w h7f8727e_0 defaults packaging 24.0 pypi_0 pypi parso 0.8.4 pypi_0 pypi pexpect 4.9.0 pypi_0 pypi pickleshare 0.7.5 pypi_0 pypi pillow 10.3.0 pypi_0 pypi pip 23.3.1 py38h06a4308_0 defaults platformdirs 4.2.0 pypi_0 pypi portalocker 2.8.2 pypi_0 pypi prompt-toolkit 3.0.43 pypi_0 pypi protobuf 5.26.1 pypi_0 pypi ptyprocess 0.7.0 pypi_0 pypi pure-eval 0.2.2 pypi_0 pypi pyasn1 0.6.0 pypi_0 pypi pyasn1-modules 0.4.0 pypi_0 pypi pybind11 2.12.0 pypi_0 pypi pydeprecate 0.3.1 pypi_0 pypi pyglet 2.0.15 pypi_0 pypi pygltflib 1.16.2 pypi_0 pypi pygments 2.17.2 pypi_0 pypi pyqt5 5.15.10 pypi_0 pypi pyqt5-qt5 5.15.2 pypi_0 pypi pyqt5-sip 12.13.0 pypi_0 pypi pyrr 0.10.3 pypi_0 pypi python 3.8.1 h0371630_1 defaults python-dateutil 2.9.0.post0 pypi_0 pypi pytorch-lightning 1.5.7 pypi_0 pypi pytorch3d 0.7.2 py38_cu116_pyt1131 pywavelets 1.4.1 pypi_0 pypi pyyaml 6.0.1 pypi_0 pypi pyzmq 24.0.1 pypi_0 pypi readline 7.0 h7b6447c_5 defaults requests 2.31.0 pypi_0 pypi requests-oauthlib 2.0.0 pypi_0 pypi roma 1.4.5 pypi_0 pypi rsa 4.9 pypi_0 pypi scikit-image 0.19.3 pypi_0 pypi scikit-video 1.1.11 pypi_0 pypi scipy 1.10.1 pypi_0 pypi segment-anything 1.0 pypi_0 pypi setuptools 69.2.0 pypi_0 pypi simple-romp 1.1.3 pypi_0 pypi six 1.16.0 pypi_0 pypi smplx 0.1.28 pypi_0 pypi sqlite 3.33.0 h62c20be_0 defaults stack-data 0.6.3 pypi_0 pypi sympy 1.12 pypi_0 pypi tabulate 0.9.0 pypi_0 pypi tensorboard 2.14.0 pypi_0 pypi tensorboard-data-server 0.7.2 pypi_0 pypi termcolor 2.4.0 pypi_0 pypi tifffile 2023.7.10 pypi_0 pypi tinycudann 1.6 pypi_0 pypi tk 8.6.12 h1ccaba5_0 defaults torch 1.13.1+cu116 pypi_0 pypi torchmetrics 1.3.2 pypi_0 pypi torchvision 0.14.1+cu116 pypi_0 pypi tornado 6.4 pypi_0 pypi tqdm 4.66.2 pypi_0 pypi traitlets 5.14.2 pypi_0 pypi trimesh 3.23.5 pypi_0 pypi triton 2.2.0 pypi_0 pypi typing-extensions 4.11.0 pypi_0 pypi typing-inspect 0.9.0 pypi_0 pypi urllib3 2.2.1 pypi_0 pypi usd-core 23.5 pypi_0 pypi wcwidth 0.2.13 pypi_0 pypi websockets 12.0 pypi_0 pypi werkzeug 3.0.2 pypi_0 pypi wget 3.2 pypi_0 pypi wheel 0.41.2 py38h06a4308_0 defaults widgetsnbextension 4.0.10 pypi_0 pypi wrapt 1.16.0 pypi_0 pypi xz 5.4.6 h5eee18b_0 defaults yacs 0.1.8 pypi_0 pypi yarl 1.9.4 pypi_0 pypi zipp 3.18.1 pypi_0 pypi zlib 1.2.13 h5eee18b_0 defaults

bayunCC commented 5 months ago

I didn't run segment-anything successfully in processing the video data.I fix this problem today and get good body model.Thanks for your great work and help.