Training problem : - Githubissues

$ CUDA_VISIBLE_DEVICES=0,1 python train.py --gpu-ids 0 --conf config.conf --data /data/huima/female-3-casual --save-folder result

scene data use female smpl /home/huima_phd/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1640811806235/work/aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, kwargs) # type: ignore[attr-defined] camera ang threshold is 0.010285 box: [-0.7080196142196655, -1.2795634269714355, -0.3215314447879791] [0.7120546102523804, 0.7051210403442383, 0.3668109178543091] Train Epoch: 1 Train Loss: 15.479950 Manifold loss: 4.882936 Grad loss: 41.597881 Normals Loss: 6.437226 Train Epoch: 2 Train Loss: 1.377734 Manifold loss: 0.331282 Grad loss: 0.664243 Normals Loss: 0.980028 Train Epoch: 3 Train Loss: 1.097442 Manifold loss: 0.086144 Grad loss: 0.387819 Normals Loss: 0.972516 Train Epoch: 4 Train Loss: 1.125766 Manifold loss: 0.129328 Grad loss: 0.496315 Normals Loss: 0.946807 Train Epoch: 5 Train Loss: 1.060438 Manifold loss: 0.151725 Grad loss: 0.276402 Normals Loss: 0.881073 Train Epoch: 6 Train Loss: 0.994520 Manifold loss: 0.097526 Grad loss: 0.391991 Normals Loss: 0.857795 Train Epoch: 7 Train Loss: 0.998818 Manifold loss: 0.100930 Grad loss: 0.227032 Normals Loss: 0.875185 Train Epoch: 8 Train Loss: 0.929964 Manifold loss: 0.095216 Grad loss: 0.294401 Normals Loss: 0.805308 Train Epoch: 9 Train Loss: 0.936847 Manifold loss: 0.111550 Grad loss: 0.204250 Normals Loss: 0.804872 Train Epoch: 10 Train Loss: 0.907328 Manifold loss: 0.077044 Grad loss: 0.312788 Normals Loss: 0.799005 Train Epoch: 11 Train Loss: 0.840160 Manifold loss: 0.028910 Grad loss: 0.221442 Normals Loss: 0.789106 Train Epoch: 12 Train Loss: 0.852796 Manifold loss: 0.054052 Grad loss: 0.170806 Normals Loss: 0.781663 Train Epoch: 13 Train Loss: 0.883151 Manifold loss: 0.069836 Grad loss: 0.243884 Normals Loss: 0.788926 Train Epoch: 14 Train Loss: 0.838328 Manifold loss: 0.061489 Grad loss: 0.137164 Normals Loss: 0.763122 Train Epoch: 15 Train Loss: 0.818893 Manifold loss: 0.040548 Grad loss: 0.205969 Normals Loss: 0.757748 Train Epoch: 16 Train Loss: 0.769657 Manifold loss: 0.033744 Grad loss: 0.140248 Normals Loss: 0.721888 scene data use female smpl /home/huima_phd/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1640811806235/work/aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, kwargs) # type: ignore[attr-defined] camera ang threshold is 0.010285 box: [-0.7080196142196655, -1.2795634269714355, -0.3215314447879791] [0.7120546102523804, 0.7051210403442383, 0.3668109178543091] /home/huima_phd/workspace/human/SelfReconCode/MCAcc/seg3d_lossless.py:246: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). stride = (self.resolutions[-1] - 1) // (resolution - 1) /home/huima_phd/workspace/human/SelfReconCode/MCAcc/seg3d_lossless.py:261: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). coords_accum = coords // stride /home/huima_phd/workspace/human/SelfReconCode/MCAcc/seg3d_lossless.py:341: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). voxels = coords // stride /home/huima_phd/workspace/human/SelfReconCode/MCAcc/seg3d_lossless.py:381: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). point_coords = coords // stride /home/huima_phd/workspace/human/SelfReconCode/MCAcc/seg3d_lossless.py:417: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). voxels = coords // stride Traceback (most recent call last): File "train.py", line 167, in loss=optNet(outs,sample_pix_num,ratio,frame_ids,debug_root) File "/home/huima_phd/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, kwargs) File "/home/huimaphd/workspace/human/SelfReconCode/model/network.py", line 492, in forward ,frags=self.maskRender(defMeshes) File "/home/huima_phd/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/huima_phd/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/renderer/mesh/renderer.py", line 98, in forward fragments = self.rasterizer(meshes_world, kwargs) File "/home/huima_phd/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/huima_phd/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/renderer/mesh/rasterizer.py", line 140, in forward meshes_screen = self.transform(meshes_world, kwargs) File "/home/huima_phd/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/renderer/mesh/rasterizer.py", line 122, in transform verts_view = cameras.get_world_to_view_transform(kwargs).transform_points( File "/home/huima_phd/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/renderer/cameras.py", line 171, in get_world_to_view_transform world_to_view_transform = get_world_to_view_transform(R=self.R, T=self.T) File "/home/huima_phd/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/renderer/cameras.py", line 1250, in get_world_to_view_transform R = Rotate(R, device=R.device) File "/home/huima_phd/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/transforms/transform3d.py", line 530, in init _check_valid_rotation_matrix(R, tol=orthogonal_tol) File "/home/huima_phd/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/transforms/transform3d.py", line 722, in _check_valid_rotation_matrix det_R = torch.det(R) RuntimeError: cusolver error: CUSOLVER_STATUS_EXECUTION_FAILED, when calling cusolverDnSgetrf( handle, m, n, dA, ldda, static_cast<float*>(dataPtr.get()), ipiv, info)

cuda version: $ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Mon_Sep_13_19:13:29_PDT_2021 Cuda compilation tools, release 11.5, V11.5.50 Build cuda_11.5.r11.5/compiler.30411180_0

I have similar problem when running $ CUDA_VISIBLE_DEVICES=0,1 python train.py --gpu-ids 0 --conf config.conf --data /data/huima/female-3-casual --save-folder result

scene data use female smpl /root/.local/conda/envs/selfrecon/lib/python3.8/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1659484657607/work/aten/src/ATen/native/TensorShape.cpp:2895.) return _VF.meshgrid(tensors, kwargs) # type: ignore[attr-defined] camera ang threshold is 0.010285 box: [-0.6890863180160522, -1.294739007949829, -0.3174707591533661] [0.6900945901870728, 0.6968696117401123, 0.35897326469421387] Traceback (most recent call last): File "train.py", line 167, in loss=optNet(outs,sample_pix_num,ratio,frame_ids,debug_root) File "/root/.local/conda/envs/selfrecon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forwardcall(*input, **kwargs) File "/root/SelfRecon/model/network.py", line 492, in forward ,frags=self.maskRender(defMeshes) File "/root/.local/conda/envs/selfrecon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/root/.local/conda/envs/selfrecon/lib/python3.8/site-packages/pytorch3d/renderer/mesh/renderer.py", line 107, in forward fragments = self.rasterizer(meshes_world, kwargs) File "/root/.local/conda/envs/selfrecon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/root/.local/conda/envs/selfrecon/lib/python3.8/site-packages/pytorch3d/renderer/mesh/rasterizer.py", line 219, in forward meshes_proj = self.transform(meshes_world, kwargs) File "/root/.local/conda/envs/selfrecon/lib/python3.8/site-packages/pytorch3d/renderer/mesh/rasterizer.py", line 201, in transform to_ndc_transform = cameras.get_ndc_camera_transform(kwargs) File "/root/.local/conda/envs/selfrecon/lib/python3.8/site-packages/pytorch3d/renderer/cameras.py", line 280, in get_ndc_camera_transform if self.in_ndc(): File "/root/.local/conda/envs/selfrecon/lib/python3.8/site-packages/pytorch3d/renderer/cameras.py", line 375, in in_ndc raise NotImplementedError() NotImplementedError (selfrecon) root@dl-2209201527473d0-pod-jupyter-5845d5bf8b-nlwnv:~/SelfRecon# CUDA_VISIBLE_DEVICES=0 python train.py --gpu-ids 0 --conf config.conf --data /root/SelfRecon/data/female-1-casual --save-folder result scene data use female smpl /root/.local/conda/envs/selfrecon/lib/python3.8/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1659484657607/work/aten/src/ATen/native/TensorShape.cpp:2895.) return _VF.meshgrid(tensors, kwargs) # type: ignore[attr-defined] camera ang threshold is 0.010285 box: [-0.6890863180160522, -1.294739007949829, -0.3174707591533661] [0.6900945901870728, 0.6968696117401123, 0.35897326469421387] Traceback (most recent call last): File "train.py", line 167, in loss=optNet(outs,sample_pix_num,ratio,frame_ids,debug_root) File "/root/.local/conda/envs/selfrecon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/root/SelfRecon/model/network.py", line 492, in forward _,frags=self.maskRender(defMeshes) File "/root/.local/conda/envs/selfrecon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/root/.local/conda/envs/selfrecon/lib/python3.8/site-packages/pytorch3d/renderer/mesh/renderer.py", line 107, in forward fragments = self.rasterizer(meshes_world, kwargs) File "/root/.local/conda/envs/selfrecon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/root/.local/conda/envs/selfrecon/lib/python3.8/site-packages/pytorch3d/renderer/mesh/rasterizer.py", line 219, in forward meshes_proj = self.transform(meshes_world, kwargs) File "/root/.local/conda/envs/selfrecon/lib/python3.8/site-packages/pytorch3d/renderer/mesh/rasterizer.py", line 201, in transform to_ndc_transform = cameras.get_ndc_camera_transform(**kwargs) File "/root/.local/conda/envs/selfrecon/lib/python3.8/site-packages/pytorch3d/renderer/cameras.py", line 280, in get_ndc_camera_transform if self.in_ndc(): File "/root/.local/conda/envs/selfrecon/lib/python3.8/site-packages/pytorch3d/renderer/cameras.py", line 375, in in_ndc raise NotImplementedError() NotImplementedError

@JuneoXIE This is error when you use higher version of pytorch3d. Replacing with version 0.4 can solve this problem, or write new camera module in model/CameraMine.py.

I meet similar problem： Traceback (most recent call last): File "train.py", line 170, in loss=optNet(outs,sample_pix_num,ratio,frame_ids,debug_root) File "/home/smy/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forwardcall(*input, **kwargs) File "/home/smy/anaconda3/projects/SelfRecon/model/network.py", line 494, in forward ,frags=self.maskRender(defMeshes) File "/home/smy/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/smy/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/renderer/mesh/renderer.py", line 98, in forward fragments = self.rasterizer(meshes_world, kwargs) File "/home/smy/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/smy/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/renderer/mesh/rasterizer.py", line 140, in forward meshes_screen = self.transform(meshes_world, kwargs) File "/home/smy/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/renderer/mesh/rasterizer.py", line 122, in transform verts_view = cameras.get_world_to_view_transform(**kwargs).transform_points( File "/home/smy/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/renderer/cameras.py", line 171, in get_world_to_view_transform world_to_view_transform = get_world_to_view_transform(R=self.R, T=self.T) File "/home/smy/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/renderer/cameras.py", line 1250, in get_world_to_view_transform R = Rotate(R, device=R.device) File "/home/smy/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/transforms/transform3d.py", line 530, in init _check_valid_rotation_matrix(R, tol=orthogonal_tol) File "/home/smy/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/transforms/transform3d.py", line 722, in _check_valid_rotation_matrix det_R = torch.det(R) RuntimeError: cusolver error: CUSOLVER_STATUS_INTERNAL_ERROR, when calling cusolverDnCreate(handle)

I meet similar problem： Traceback (most recent call last): File "train.py", line 170, in loss=optNet(outs,sample_pix_num,ratio,frame_ids,debug_root) File "/home/smy/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forwardcall(*input, **kwargs) File "/home/smy/anaconda3/projects/SelfRecon/model/network.py", line 494, in forward ,frags=self.maskRender(defMeshes) File "/home/smy/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/smy/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/renderer/mesh/renderer.py", line 98, in forward fragments = self.rasterizer(meshes_world, kwargs) File "/home/smy/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/smy/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/renderer/mesh/rasterizer.py", line 140, in forward meshes_screen = self.transform(meshes_world, kwargs) File "/home/smy/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/renderer/mesh/rasterizer.py", line 122, in transform verts_view = cameras.get_world_to_view_transform(kwargs).transform_points( File "/home/smy/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/renderer/cameras.py", line 171, in get_world_to_view_transform world_to_view_transform = get_world_to_view_transform(R=self.R, T=self.T) File "/home/smy/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/renderer/cameras.py", line 1250, in get_world_to_view_transform R = Rotate(R, device=R.device) File "/home/smy/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/transforms/transform3d.py", line 530, in init** _check_valid_rotation_matrix(R, tol=orthogonal_tol) File "/home/smy/anaconda3/envs/SelfRecon/lib/python3.8/site-packages/pytorch3d-0.4.0-py3.8-linux-x86_64.egg/pytorch3d/transforms/transform3d.py", line 722, in _check_valid_rotation_matrix det_R = torch.det(R) RuntimeError: cusolver error: CUSOLVER_STATUS_INTERNAL_ERROR, when calling cusolverDnCreate(handle)

It's amazing that this error only appeared when I first run this program. And it disappeared when I run it again.

jby1993 / SelfReconCode

Training problem : #6