wenbowen123 / catgrasp

[ICRA 2022] CaTGrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation
Apache License 2.0
298 stars 82 forks source link

CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)` #43

Open lihenghitcs opened 1 year ago

lihenghitcs commented 1 year ago

When running python run_grasp_simulation.py, the following error occurs:

pybullet build time: Dec  1 2021 18:33:04
Gripper hand_depth: 0.018883
Gripper init_bite: 0.005
Gripper max_width: 0.048
Gripper hand_height: 0.020832
Gripper finger_width: 0.00586
Gripper hand_outer_diameter: 0.061398
Sdf3D self.dims_=[168 168 168], self.resolution_=0.000994335, self.origin_=[-0.0835218 -0.083531   0.0678743], center_sdf=-0.0309976, boundary_sdf=0.0808156
sdf_dir /home/catgrasp/dexnet/grasping/../../urdf/robotiq_hande/gripper_enclosed_air_tight.sdf
Sdf3D self.dims_=[168 168 168], self.resolution_=0.000994683, self.origin_=[-0.083551  -0.0835602  0.0678726], center_sdf=-0.0309976, boundary_sdf=0.080857
GraspPredicter artifact_dir /home/catgrasp/artifacts/artifacts-50
phase=test #self.keys=0
Load ckpt from /home/catgrasp/artifacts/artifacts-50/best_val.pth.tar
NunocsPredicter artifact_dir /home/catgrasp/artifacts/artifacts-76
phase=test #self.files=0
Load ckpt from /home/catgrasp/artifacts/artifacts-76/best_val.pth.tar
PointGroupPredictor artifact_dir /home/catgrasp/artifacts/artifacts-77
config_dir /home/catgrasp/artifacts/artifacts-77/config_pointgroup.yaml
phase: test, num files=0
Load ckpt from /home/catgrasp/artifacts/artifacts-77/best_val.pth.tar
NocsTransferGraspSampler score_larger_than=0.95, center_ob_between_gripper=False, max_n_grasp=10000, #canonical_grasp=10000, before has 747195
startThreads creating 1 threads.
starting thread 0
started thread 0
argc=2
argv[0] = --unused
argv[1] = --start_demo_name=Physics Server
ExampleBrowserThreadFunc started
X11 functions dynamically loaded using dlopen/dlsym OK!
X11 functions dynamically loaded using dlopen/dlsym OK!
Creating context
Created GL 3.3 context
Direct GLX rendering context obtained
Making context current
GL_VENDOR=VMware, Inc.
GL_RENDERER=llvmpipe (LLVM 10.0.0, 128 bits)
GL_VERSION=3.3 (Core Profile) Mesa 20.0.8
GL_SHADING_LANGUAGE_VERSION=3.30
pthread_getconcurrency()=0
Version = 3.3 (Core Profile) Mesa 20.0.8
Vendor = VMware, Inc.
Renderer = llvmpipe (LLVM 10.0.0, 128 bits)
b3Printf: Selected demo: Physics Server
startThreads creating 1 threads.
starting thread 0
started thread 0
MotionThreadFunc thread started
ven = VMware, Inc.
ven = VMware, Inc.
bullet server already connected
gripper_dir /home/catgrasp/dexnet/grasping/../../urdf/robotiq_hande
self.env_body_ids [0, 1]
(0, b'world_iiwa_joint', 4, -1, -1, 0, 0.0, 0.0, 0.0, -1.0, 0.0, 0.0, b'arm_iiwa_link_0', (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 0.0, 1.0), -1)
(1, b'iiwa_joint_1', 0, 7, 6, 1, 2.0, 1.0, -2.96, 2.96, 320.0, 10.0, b'arm_iiwa_link_1', (0.0, 0.0, 1.0), (0.1, 0.0, 0.0875), (0.0, 0.0, 0.0, 1.0), 0)
(2, b'iiwa_joint_2', 0, 8, 7, 1, 2.0, 1.0, -2.09, 2.09, 320.0, 10.0, b'arm_iiwa_link_2', (0.0, 0.0, 1.0), (0.0, 0.03, 0.08249999999999999), (9.381873917569987e-07, 0.7071080798588513, 0.707105482510614, -9.381839456086129e-07), 1)
(3, b'iiwa_joint_3', 0, 9, 8, 1, 2.0, 1.0, -2.96, 2.96, 176.0, 10.0, b'arm_iiwa_link_3', (0.0, 0.0, 1.0), (-0.0003, 0.14549999999862046, -0.042000751170443634), (9.381873916908391e-07, 0.7071080798588513, 0.707105482510614, 9.381839456747728e-07), 2)
(4, b'iiwa_joint_4', 0, 10, 9, 1, 2.0, 1.0, -2.09, 2.09, 176.0, 10.0, b'arm_iiwa_link_4', (0.0, 0.0, 1.0), (0.0, -0.03, 0.08550000000000002), (-0.7071080798594737, 0.0, 0.0, 0.7071054825112363), 3)
(5, b'iiwa_joint_5', 0, 11, 10, 1, 2.0, 1.0, -2.96, 2.96, 110.0, 10.0, b'arm_iiwa_link_5', (0.0, 0.0, 1.0), (0.0, 0.11749999999875532, -0.03400067770634157), (-9.381873916908391e-07, 0.7071080798588513, 0.707105482510614, -9.381839456747728e-07), 4)
(6, b'iiwa_joint_6', 0, 12, 11, 1, 2.0, 1.0, -2.09, 2.09, 40.0, 10.0, b'arm_iiwa_link_6', (0.0, 0.0, 1.0), (-0.0001, -0.021, 0.1394999999999999), (-0.7071080798594737, 6.21032719178595e-23, 6.210304380022312e-23, 0.7071054825112363), 5)
(7, b'iiwa_joint_7', 0, 13, 12, 1, 2.0, 1.0, -3.05433, 3.05433, 40.0, 10.0, b'arm_iiwa_link_7', (0.0, 0.0, 1.0), (1.9014789953427065e-27, 0.0803999999994535, -0.0004002975296133711), (9.381873916908391e-07, 0.7071080798588513, 0.707105482510614, 9.381839456747728e-07), 6)
(8, b'toolchanger_joint', 4, -1, -1, 0, 0.0, 0.0, 0.0, -1.0, 0.0, 0.0, b'toolchanger_base_link', (0.0, 0.0, 0.0), (0.0, 0.0, 0.05426099999999999), (0.0, 0.0, 0.7071080798594737, 0.7071054825112363), 7)
(9, b'attach_joint', 4, -1, -1, 0, 0.0, 0.0, 0.0, -1.0, 0.0, 0.0, b'toolchanger_tool_attach', (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, -1.8087206151615695e-22, 1.0), 8)
(10, b'force_torqe_sensor_joint', 4, -1, -1, 0, 0.0, 0.0, 0.0, -1.0, 0.0, 0.0, b'force_torqe_sensor_base_link', (0.0, 0.0, 0.0), (0.0, 0.0, 0.03695699999999991), (0.0, 0.0, -1.8087206151615695e-22, 1.0), 9)
(11, b'attach_joint_', 4, -1, -1, 0, 0.0, 0.0, 0.0, -1.0, 0.0, 0.0, b'force_torqe_sensor_tool_attach', (0.0, 0.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, -1.8087206151615695e-22, 1.0), 10)
(12, b'axia_gripper_joint', 4, -1, -1, 0, 0.0, 0.0, 0.0, -1.0, 0.0, 0.0, b'robotiq_hande_gripper_body', (0.0, 0.0, 0.0), (0.0, 0.0, -0.027666999999999914), (0.0, 0.0, 0.7071041838352712, 0.7071093785282833), 11)
(13, b'finger1', 1, 14, 13, 1, 0.0, 0.0, 0.0, 0.025, 100.0, 10.0, b'robotiq_hande_gripper_finger1', (0.0, 1.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 1.8087206151615695e-22, 1.0), 12)
(14, b'finger2', 1, 15, 14, 1, 0.0, 0.0, 0.0, 0.025, 100.0, 10.0, b'robotiq_hande_gripper_finger2', (0.0, -1.0, 0.0), (0.0, 0.0, 0.0), (0.0, 0.0, 1.8087206151615695e-22, 1.0), 12)
self.env_body_ids [0, 1, 2]
Making pile /home/catgrasp/data/object_models/screw_carr_94323A329_NYLON.obj scale=[1. 1. 1.]
Add new objects on pile #=4
simulation_until_stable....
Finished simulation
simulation_until_stable....
Finished simulation
simulation_until_stable....
Finished simulation
simulation_until_stable....
Finished simulation
simulation_until_stable....
Finished simulation
simulation_until_stable....
Finished simulation
simulation_until_stable....
Finished simulation
simulation_until_stable....
Finished simulation
simulation_until_stable....
Finished simulation
simulation_until_stable....
Finished simulation
simulation_until_stable....
Finished simulation
simulation_until_stable....
Finished simulation
simulation_until_stable....
Finished simulation
simulation_until_stable....
Finished simulation
symmetry_tfs: 72
simulation_until_stable....
Finished simulation
Traceback (most recent call last):
  File "run_grasp_simulation.py", line 717, in <module>
    simulate_grasp_with_arm()
  File "run_grasp_simulation.py", line 575, in simulate_grasp_with_arm
    for grasps in compute_candidate_grasp(rgb,depth,seg,i_pick=i_pick,env=env,ags=ags,symmetry_tfs=symmetry_tfs,ik_func=ik_func):
  File "run_grasp_simulation.py", line 213, in compute_candidate_grasp
    scene_seg = seg_predicter.predict(copy.deepcopy(seg_input_data))
  File "/home/catgrasp/predicter.py", line 304, in predict
    ret = self.model(input_, p2v_map, coords_float, coords[:, 0].int(), batch_offsets, epoch=self.model.prepare_epochs-1)
  File "/opt/conda/envs/catgrasp/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/catgrasp/PointGroup/model/pointgroup/pointgroup.py", line 226, in forward
    output = self.input_conv(input_tensor)
  File "/opt/conda/envs/catgrasp/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/catgrasp/lib/python3.7/site-packages/spconv/modules.py", line 123, in forward
    input = module(input)
  File "/opt/conda/envs/catgrasp/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/catgrasp/lib/python3.7/site-packages/spconv/conv.py", line 157, in forward
    outids.shape[0])
  File "/opt/conda/envs/catgrasp/lib/python3.7/site-packages/spconv/functional.py", line 83, in forward
    return ops.indice_conv(features, filters, indice_pairs, indice_pair_num, num_activate_out, False, True)
  File "/opt/conda/envs/catgrasp/lib/python3.7/site-packages/spconv/ops.py", line 112, in indice_conv
    int(inverse), int(subm))
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)` (gemm<float> at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/ATen/cuda/CUDABlas.cpp:182)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f6f79822e37 in /opt/conda/envs/catgrasp/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x3b45097 (0x7f6f8459d097 in /opt/conda/envs/catgrasp/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #2: THCudaTensor_addmm + 0x378 (0x7f6f84979518 in /opt/conda/envs/catgrasp/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x3bfa038 (0x7f6f84652038 in /opt/conda/envs/catgrasp/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #4: <unknown function> + 0x3b53fd8 (0x7f6f845abfd8 in /opt/conda/envs/catgrasp/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #5: torch::autograd::VariableType::mm_out(at::Tensor&, at::Tensor const&, at::Tensor const&) + 0x645 (0x7f6f83e9cad5 in /opt/conda/envs/catgrasp/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #6: at::Tensor spconv::indiceConv<float>(at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long) + 0xa0b (0x7f6f6353dcbb in /opt/conda/envs/catgrasp/lib/python3.7/site-packages/spconv/libspconv.so)
frame #7: c10::guts::infer_function_traits_t::return_type c10::detail::call_functor_with_args_from_stack_<c10::detail::WrapRuntimeKernelFunctor_<at::Tensor (*)(at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long), at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long> >, true, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul>(c10::detail::WrapRuntimeKernelFunctor_<at::Tensor (*)(at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long), at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long> >*, std::vector<c10::IValue, std::allocator<c10::IValue> >*, std::integer_sequence<unsigned long, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul>) + 0x146 (0x7f6f63547756 in /opt/conda/envs/catgrasp/lib/python3.7/site-packages/spconv/libspconv.so)
frame #8: c10::detail::wrap_kernel_functor<c10::detail::WrapRuntimeKernelFunctor_<at::Tensor (*)(at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long), at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long> >, true, void>::call(std::vector<c10::IValue, std::allocator<c10::IValue> >*, c10::KernelCache*) + 0x2f (0x7f6f6354787f in /opt/conda/envs/catgrasp/lib/python3.7/site-packages/spconv/libspconv.so)
frame #9: <unknown function> + 0x383f5f0 (0x7f6f842975f0 in /opt/conda/envs/catgrasp/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #10: <unknown function> + 0x45050c (0x7f6faffc450c in /opt/conda/envs/catgrasp/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #11: <unknown function> + 0x41bf44 (0x7f6faff8ff44 in /opt/conda/envs/catgrasp/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #12: <unknown function> + 0x1c8066 (0x7f6fafd3c066 in /opt/conda/envs/catgrasp/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #21: THPFunction_apply(_object*, _object*) + 0x8e6 (0x7f6faff62726 in /opt/conda/envs/catgrasp/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

numActiveThreads = 0
stopping threads
Thread with taskId 0 exiting
Thread TERMINATED
destroy semaphore
semaphore destroyed
destroy main semaphore
main semaphore destroyed
finished
numActiveThreads = 0
btShutDownExampleBrowser stopping threads
Thread with taskId 0 exiting
Thread TERMINATED
destroy semaphore
semaphore destroyed
destroy main semaphore
main semaphore destroyed
pybullet disconnected

It seems that the torch's version doesn't match cuda's version, but after updating torch by

pip install torch==1.5.0+cu101 torchvision==0.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

, I can't bash build.sh (build.log).

wenbowen123 commented 1 year ago

Hi, did you launch the docker image by bash run_container.sh ?

lihenghitcs commented 1 year ago

Hi, did you launch the docker image by bash run_container.sh ?

By zsh run_container.sh, but I think it doesn't matter.

wenbowen123 commented 1 year ago

this might be related https://github.com/traveller59/spconv/issues/69

wyx040821 commented 6 months ago

I have the same problem. Have you saved it ?