lty2226262 / Car_studio

50 stars 1 forks source link

RuntimeError: CUDA error: device-side assert triggered #3

Closed Starak-x closed 6 months ago

Starak-x commented 6 months ago

Hi Tianyu, Congrats on getting this great work accpeted and thanks for releasing the code! I got this error when run ns-train car-nerf And I use cuda11.8 pytorch13.1 image

Starak-x commented 6 months ago

../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [74,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [75,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [76,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [77,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [78,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [79,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [80,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [81,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [82,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [83,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [84,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [85,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [86,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [87,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [88,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [89,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [90,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [91,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [92,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [93,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [94,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [6,0,0], thread: [95,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. Printing profiling stats, from longest to shortest duration in seconds Trainer.train_iteration: 8.3514
CarNerfStageOnePipeline.get_train_loss_dict: 8.3510
Traceback (most recent call last): File "/datahdd/hx/miniconda3/envs/car/bin/ns-train", line 8, in sys.exit(entrypoint()) File "/datahdd/hx/Car_studio/dependencies/nerfstudio/nerfstudio/scripts/train.py", line 260, in entrypoint main( File "/datahdd/hx/Car_studio/dependencies/nerfstudio/nerfstudio/scripts/train.py", line 246, in main launch( File "/datahdd/hx/Car_studio/dependencies/nerfstudio/nerfstudio/scripts/train.py", line 185, in launch main_func(local_rank=0, world_size=world_size, config=config) File "/datahdd/hx/Car_studio/dependencies/nerfstudio/nerfstudio/scripts/train.py", line 100, in train_loop trainer.train() File "/datahdd/hx/Car_studio/dependencies/nerfstudio/nerfstudio/engine/trainer.py", line 242, in train loss, loss_dict, metrics_dict = self.train_iteration(step) File "/datahdd/hx/Car_studio/dependencies/nerfstudio/nerfstudio/utils/profiler.py", line 127, in inner out = func(*args, kwargs) File "/datahdd/hx/Car_studio/dependencies/nerfstudio/nerfstudio/engine/trainer.py", line 448, in trainiteration , loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step) File "/datahdd/hx/Car_studio/dependencies/nerfstudio/nerfstudio/utils/profiler.py", line 127, in inner out = func(*args, *kwargs) File "/datahdd/hx/Car_studio/car_studio/pipelines/car_nerf_stage_one.py", line 133, in get_train_loss_dict ray_bundle, batch = self.datamanager.next_train(step) File "/datahdd/hx/Car_studio/car_studio/data/datamanagers/car_patch_datamanager.py", line 304, in next_train ray_bundle = self.train_ray_generator(ray_indices) File "/datahdd/hx/miniconda3/envs/car/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/datahdd/hx/Car_studio/car_studio/model_components/custom_ray_generators.py", line 37, in forward ray_bundle = self.cameras.generate_rays( File "/datahdd/hx/Car_studio/dependencies/nerfstudio/nerfstudio/cameras/cameras.py", line 452, in generate_rays raybundle = cameras._generate_rays_from_coords( File "/datahdd/hx/Car_studio/dependencies/nerfstudio/nerfstudio/cameras/cameras.py", line 647, in _generate_rays_from_coords cam_types = torch.unique(self.camera_type, sorted=False) File "/datahdd/hx/miniconda3/envs/car/lib/python3.8/site-packages/torch/_jit_internal.py", line 485, in fn return if_false(*args, *kwargs) File "/datahdd/hx/miniconda3/envs/car/lib/python3.8/site-packages/torch/_jit_internal.py", line 485, in fn return if_false(args, **kwargs) File "/datahdd/hx/miniconda3/envs/car/lib/python3.8/site-packages/torch/functional.py", line 877, in _returnoutput output, , _ = _unique_impl(input, sorted, return_inverse, return_counts, dim) File "/datahdd/hx/miniconda3/envs/car/lib/python3.8/site-packages/torch/functional.py", line 791, in _unique_impl output, inverse_indices, counts = torch._unique2( RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

lty2226262 commented 6 months ago

It appears that the issue you're experiencing is likely caused by an incompatible CUDA version. Based on the information provided on the PyTorch website (https://pytorch.org/get-started/previous-versions/), PyTorch version 1.13.1 only supports CUDA versions 11.6 and 11.7. To resolve this issue, I suggest following these steps:

Install the "nerfstudio" dependency first. Make sure to install the appropriate version compatible with your CUDA version. Try running the example code using "nerfstudio" to check if it works without any compatibility issues. Once you have confirmed that the code runs successfully with "nerfstudio," proceed to install "car-studio" to continue your work. If you encounter any compatibility issues between "nerfstudio" and "car-studio," you can investigate further to identify and resolve the specific incompatibility problem

Starak-x commented 6 months ago

I use CUDA11.7 and Pytorch13.1, but the problem still exists

Starak-x commented 6 months ago

I changed the "h“ to "height" and "w" to "width" in carstudio_dataparser.py, then the problem was solved image image

K0uya commented 4 months ago

Hi! Thank you for your great work!

I have met the same problem even though I changed "w" to "width" and "h" to "height". I found that image_coords does not cover all values of y and x. How could I solve this one?

スクリーンショット 2024-07-11 16 08 36

Thank you!

lty2226262 commented 3 months ago

The terms 'w' and 'h' refer to the dimensions of the image, specifically its width and height. On the other hand, 'width' and 'height' are used to describe the dimensions of the car within the image. To assist with debugging, could you please specify which dataset you are encountering this issue with? Providing additional details will be greatly appreciated.

Hi! Thank you for your great work!

I have met the same problem even though I changed "w" to "width" and "h" to "height". I found that image_coords does not cover all values of y and x. How could I solve this one? スクリーンショット 2024-07-11 16 08 36

Thank you!

K0uya commented 3 months ago

Thank you for your replay! As far as I remember, I used kitti-mot dataset which was preprocessed by the following code.

python car_studio/scripts/datasets/process_kitti.py --dataset km --data_dir "./data/kitti-mot"