nianticlabs / acezero

[ECCV 2024 - Oral] ACE0 is a learning-based structure-from-motion approach that estimates camera parameters of sets of images by learning a multi-view consistent, implicit scene representation.
https://nianticlabs.github.io/acezero/
Other
655 stars 41 forks source link

Errors running ace_zero.py with WSL2 #9

Closed gradeeterna closed 3 months ago

gradeeterna commented 3 months ago

Hi, I haven't been able to get this working in WSL2 with Ubuntu 22.04.

I get the following errors when I run this command with any image folder:

python ace_zero.py "/path/to/some/images/*.jpg" result_folder

I'm following the installation instructions, and everything seems to have installed correctly so not sure if this is a WSL2 issue or something else. Thanks!

Error Log (click to expand) ```text INFO:__main__:Starting reconstruction of files matching data/impcanart/*.jpg. INFO:__main__:Downloading ZoeDepth model from the main process. Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master Using cache found in /home/grade/.cache/torch/hub/isl-org_ZoeDepth_main img_size [384, 512] Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master /home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1678402421473/work/aten/src/ATen/native/TensorShape.cpp:3483.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] Params passed to Resize transform: width: 512 height: 384 resize_target: True keep_aspect_ratio: True ensure_multiple_of: 32 resize_method: minimal Using pretrained resource url::https://github.com/isl-org/ZoeDepth/releases/download/v1.0/ZoeD_M12_NK.pt Loaded successfully INFO:dataset_io:Loaded pretrained ZoeDepth model. INFO:__main__:Depth estimation model ready to use. INFO:__main__:Trying seeds: [0.36983939 0.28398129 0.75519018 0.17367966 0.46093806] INFO:__main__:Processing 5 seeds in parallel. [Parallel(n_jobs=3)]: Using backend LokyBackend with 3 concurrent workers. INFO:ace_trainer:Using device for training: cuda INFO:ace_trainer:ACE feature buffer device: cuda INFO:ace_trainer:Setting random seed to 2089 INFO:ace_trainer:Disabling multi-threaded data loading because we cannot run multiple depth inference passes simultaneously. INFO:dataset:Overwriting focal length with heuristic derived from image dimensions. INFO:dataset:Loading RGB files from: data/impcanart/*.jpg INFO:dataset:Overwriting dataset with single image: 132 - data/impcanart/impcanart1-right_0017.jpg INFO:dataset:Using ZoeDepth for depth initialization. Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master Using cache found in /home/grade/.cache/torch/hub/isl-org_ZoeDepth_main Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master /home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1678402421473/work/aten/src/ATen/native/TensorShape.cpp:3483.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] INFO:dataset_io:Loaded pretrained ZoeDepth model. INFO:ace_trainer:Loaded training scan from: data/impcanart/*.jpg -- 1 images, mean: 0.00 0.00 0.00 INFO:ace_trainer:Loaded pretrained encoder from: ace_encoder_pretrained.pt INFO:ace_network:Creating Regressor using pretrained encoder with 512 feature size. INFO:ace_trainer:Starting creation of the training buffer. joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 436, in _process_worker r = call_item() File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 288, in __call__ return self.fn(*self.args, **self.kwargs) File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 595, in __call__ return self.func(*args, **kwargs) File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 263, in __call__ return [func(*args, **kwargs) File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 263, in return [func(*args, **kwargs) File "/home/grade/src/acezero/ace_zero_util.py", line 236, in map_seed run_cmd(mapping_cmd, verbose=verbose) File "/home/grade/src/acezero/ace_zero_util.py", line 49, in run_cmd raise RuntimeError("Error running ACE0: \nCommand:\n" + " ".join(cmd_str)) RuntimeError: Error running ACE0: Command: ./train_ace.py data/impcanart/*.jpg result_folder/iteration0_seed0.pt --repro_loss_type tanh --render_target_path result_folder/renderings --render_marker_size 0.03 --refinement_ortho gram-schmidt --ace_pose_file_conf_threshold 500 --render_flipped_portrait False --pose_refinement_wait 0 --image_resolution 480 --pose_refinement_lr 0.001 --num_head_blocks 1 --repro_loss_hard_clamp 1000 --repro_loss_soft_clamp 50 --iterations_output 500 --max_dataset_passes 10 --learning_rate_schedule 1cyclepoly --learning_rate_max 0.003 --learning_rate_cooldown_iterations 5000 --learning_rate_cooldown_trigger_percent_threshold 0.7 --aug_rotation 15 --training_buffer_cpu False --num_data_workers 4 --render_visualization False --use_pose_seed 0.36983939211266215 --iterations 10000 --use_heuristic_focal_length True """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "ace_zero.py", line 187, in seed_reg_rates = Parallel(n_jobs=opt.seed_parallel_workers, verbose=1)( File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 1061, in __call__ self.retrieve() File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 938, in retrieve self._output.extend(job.get(timeout=self.timeout)) File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result return future.result(timeout=timeout) File "/home/grade/miniconda3/envs/ace0/lib/python3.8/concurrent/futures/_base.py", line 444, in result return self.__get_result() File "/home/grade/miniconda3/envs/ace0/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result raise self._exception RuntimeError: Error running ACE0: Command: ./train_ace.py data/impcanart/*.jpg result_folder/iteration0_seed0.pt --repro_loss_type tanh --render_target_path result_folder/renderings --render_marker_size 0.03 --refinement_ortho gram-schmidt --ace_pose_file_conf_threshold 500 --render_flipped_portrait False --pose_refinement_wait 0 --image_resolution 480 --pose_refinement_lr 0.001 --num_head_blocks 1 --repro_loss_hard_clamp 1000 --repro_loss_soft_clamp 50 --iterations_output 500 --max_dataset_passes 10 --learning_rate_schedule 1cyclepoly --learning_rate_max 0.003 --learning_rate_cooldown_iterations 5000 --learning_rate_cooldown_trigger_percent_threshold 0.7 --aug_rotation 15 --training_buffer_cpu False --num_data_workers 4 --render_visualization False --use_pose_seed 0.36983939211266215 --iterations 10000 --use_heuristic_focal_length True
ebrach commented 3 months ago

Hi!

Can you please try to run ACE0 with --seed_parallel_workers 1 and report back here? The actual error message might have been swallowed by the parallel processing when mapping the seed images.

Best, Eric

gradeeterna commented 3 months ago

Hey Eric,

The error messages aren't much different with --seed_parallel_workers 1 unfortunately.

Error Log (click to expand) ```text python ace_zero.py "data/impcanart/*.jpg" result_folder --seed_parallel_workers 1 INFO:__main__:Starting reconstruction of files matching data/impcanart/*.jpg. INFO:__main__:Downloading ZoeDepth model from the main process. Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master Using cache found in /home/grade/.cache/torch/hub/isl-org_ZoeDepth_main img_size [384, 512] Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master /home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1678402421473/work/aten/src/ATen/native/TensorShape.cpp:3483.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] Params passed to Resize transform: width: 512 height: 384 resize_target: True keep_aspect_ratio: True ensure_multiple_of: 32 resize_method: minimal Using pretrained resource url::https://github.com/isl-org/ZoeDepth/releases/download/v1.0/ZoeD_M12_NK.pt Loaded successfully INFO:dataset_io:Loaded pretrained ZoeDepth model. INFO:__main__:Depth estimation model ready to use. INFO:__main__:Trying seeds: [0.36983939 0.28398129 0.75519018 0.17367966 0.46093806] [Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers. INFO:ace_zero_util:Processing seed 0: 0.36983939211266215 INFO:ace_trainer:Using device for training: cuda INFO:ace_trainer:ACE feature buffer device: cuda INFO:ace_trainer:Setting random seed to 2089 INFO:ace_trainer:Disabling multi-threaded data loading because we cannot run multiple depth inference passes simultaneously. INFO:dataset:Overwriting focal length with heuristic derived from image dimensions. INFO:dataset:Loading RGB files from: data/impcanart/*.jpg INFO:dataset:Overwriting dataset with single image: 132 - data/impcanart/impcanart1-right_0017.jpg INFO:dataset:Using ZoeDepth for depth initialization. Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master Using cache found in /home/grade/.cache/torch/hub/isl-org_ZoeDepth_main Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master /home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1678402421473/work/aten/src/ATen/native/TensorShape.cpp:3483.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] INFO:dataset_io:Loaded pretrained ZoeDepth model. INFO:ace_trainer:Loaded training scan from: data/impcanart/*.jpg -- 1 images, mean: 0.00 0.00 0.00 INFO:ace_trainer:Loaded pretrained encoder from: ace_encoder_pretrained.pt INFO:ace_network:Creating Regressor using pretrained encoder with 512 feature size. INFO:ace_trainer:Starting creation of the training buffer. Traceback (most recent call last): File "ace_zero.py", line 187, in seed_reg_rates = Parallel(n_jobs=opt.seed_parallel_workers, verbose=1)( File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 1048, in __call__ if self.dispatch_one_batch(iterator): File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 864, in dispatch_one_batch self._dispatch(tasks) File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 782, in _dispatch job = self._backend.apply_async(batch, callback=cb) File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async result = ImmediateResult(func) File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 572, in __init__ self.results = batch() File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 263, in __call__ return [func(*args, **kwargs) File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 263, in return [func(*args, **kwargs) File "/home/grade/src/acezero/ace_zero_util.py", line 236, in map_seed run_cmd(mapping_cmd, verbose=verbose) File "/home/grade/src/acezero/ace_zero_util.py", line 49, in run_cmd raise RuntimeError("Error running ACE0: \nCommand:\n" + " ".join(cmd_str)) RuntimeError: Error running ACE0: Command: ./train_ace.py data/impcanart/*.jpg result_folder/iteration0_seed0.pt --repro_loss_type tanh --render_target_path result_folder/renderings --render_marker_size 0.03 --refinement_ortho gram-schmidt --ace_pose_file_conf_threshold 500 --render_flipped_portrait False --pose_refinement_wait 0 --image_resolution 480 --pose_refinement_lr 0.001 --num_head_blocks 1 --repro_loss_hard_clamp 1000 --repro_loss_soft_clamp 50 --iterations_output 500 --max_dataset_passes 10 --learning_rate_schedule 1cyclepoly --learning_rate_max 0.003 --learning_rate_cooldown_iterations 5000 --learning_rate_cooldown_trigger_percent_threshold 0.7 --aug_rotation 15 --training_buffer_cpu False --num_data_workers 12 --render_visualization False --use_pose_seed 0.36983939211266215 --iterations 10000 --use_heuristic_focal_length True
ebrach commented 3 months ago

Too bad. Let's check whether you can call the ACE mapper directly. Please run:

python train_ace.py "data/impcanart/*.jpg" test.pt --use_pose_seed 0

(This command will train a network test.pt from the first image of your dataset. The ACE0 meta-script will call something very similar to start the reconstruction.)

gradeeterna commented 3 months ago

That gives this error:

python train_ace.py "data/impcanart/*.jpg" test.pt --use_pose_seed 0

Traceback (most recent call last):
  File "train_ace.py", line 237, in <module>
    raise ValueError("Either use_heuristic_focal_length or use_external_focal_length "
ValueError: Either use_heuristic_focal_length or use_external_focal_length or use_ace_pose_file has to be set.
ebrach commented 3 months ago

Oh, sorry. This is on me. Please try again:

python train_ace.py "data/impcanart/*.jpg" test.pt --use_pose_seed 0 --use_heuristic_focal_length True

(we have to tell ACE what to do wrt intrinsics)

gradeeterna commented 3 months ago

Also a pretty vague error with that unfortunately!

Error(click to expand) ```text python train_ace.py "data/impcanart/.jpg" test.pt --use_pose_seed 0 --use_heuristic_focal_length True INFO:ace_trainer:Using device for training: cuda INFO:ace_trainer:ACE feature buffer device: cuda INFO:ace_trainer:Setting random seed to 2089 INFO:ace_trainer:Disabling multi-threaded data loading because we cannot run multiple depth inference passes simultaneously. INFO:dataset:Overwriting focal length with heuristic derived from image dimensions. INFO:dataset:Loading RGB files from: data/impcanart/.jpg INFO:dataset:Overwriting dataset with single image: 0 - data/impcanart/impcanart1-back_0001.jpg INFO:dataset:Using ZoeDepth for depth initialization. Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master Using cache found in /home/grade/.cache/torch/hub/isl-org_ZoeDepth_main img_size [384, 512] Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master /home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1678402421473/work/aten/src/ATen/native/TensorShape.cpp:3483.) return _VF.meshgrid(tensors, *kwargs) # type: ignore[attr-defined] Params passed to Resize transform: width: 512 height: 384 resize_target: True keep_aspect_ratio: True ensure_multiple_of: 32 resize_method: minimal Using pretrained resource url::https://github.com/isl-org/ZoeDepth/releases/download/v1.0/ZoeD_M12_NK.pt Loaded successfully INFO:dataset_io:Loaded pretrained ZoeDepth model. INFO:ace_trainer:Loaded training scan from: data/impcanart/.jpg -- 1 images, mean: 0.00 0.00 0.00 INFO:ace_trainer:Loaded pretrained encoder from: ace_encoder_pretrained.pt INFO:ace_network:Creating Regressor using pretrained encoder with 512 feature size. INFO:ace_trainer:Starting creation of the training buffer. /home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/PIL/TiffImagePlugin.py:858: UserWarning: Corrupt EXIF data. Expecting to read 2 bytes but only got 0. warnings.warn(str(msg)) You are running using the stub version of nvrtc . You are running using the stub version of nvrtc Segmentation fault
gradeeterna commented 3 months ago

Sorry, turned out to be an issue with my system CUDA setup. Just reinstalled some stuff and managed to get it working. Thanks for the help, and look forward to trying it out!

kk6398 commented 3 months ago

I met the same error as you, can you tell me the deal details please?

Hey Eric,

The error messages aren't much different with --seed_parallel_workers 1 unfortunately.

Error Log (click to expand)


python ace_zero.py "data/impcanart/*.jpg" result_folder --seed_parallel_workers 1

INFO:__main__:Starting reconstruction of files matching data/impcanart/*.jpg.
INFO:__main__:Downloading ZoeDepth model from the main process.
Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master
Using cache found in /home/grade/.cache/torch/hub/isl-org_ZoeDepth_main
img_size [384, 512]
Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master
/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1678402421473/work/aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Params passed to Resize transform:
        width:  512
        height:  384
        resize_target:  True
        keep_aspect_ratio:  True
        ensure_multiple_of:  32
        resize_method:  minimal
Using pretrained resource url::https://github.com/isl-org/ZoeDepth/releases/download/v1.0/ZoeD_M12_NK.pt
Loaded successfully
INFO:dataset_io:Loaded pretrained ZoeDepth model.
INFO:__main__:Depth estimation model ready to use.
INFO:__main__:Trying seeds: [0.36983939 0.28398129 0.75519018 0.17367966 0.46093806]
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
INFO:ace_zero_util:Processing seed 0: 0.36983939211266215
INFO:ace_trainer:Using device for training: cuda
INFO:ace_trainer:ACE feature buffer device: cuda
INFO:ace_trainer:Setting random seed to 2089
INFO:ace_trainer:Disabling multi-threaded data loading because we cannot run multiple depth inference passes simultaneously.
INFO:dataset:Overwriting focal length with heuristic derived from image dimensions.
INFO:dataset:Loading RGB files from: data/impcanart/*.jpg
INFO:dataset:Overwriting dataset with single image: 132 - data/impcanart/impcanart1-right_0017.jpg
INFO:dataset:Using ZoeDepth for depth initialization.
Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master
Using cache found in /home/grade/.cache/torch/hub/isl-org_ZoeDepth_main
Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master
/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1678402421473/work/aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
INFO:dataset_io:Loaded pretrained ZoeDepth model.
INFO:ace_trainer:Loaded training scan from: data/impcanart/*.jpg -- 1 images, mean: 0.00 0.00 0.00
INFO:ace_trainer:Loaded pretrained encoder from: ace_encoder_pretrained.pt
INFO:ace_network:Creating Regressor using pretrained encoder with 512 feature size.
INFO:ace_trainer:Starting creation of the training buffer.
Traceback (most recent call last):
  File "ace_zero.py", line 187, in <module>
    seed_reg_rates = Parallel(n_jobs=opt.seed_parallel_workers, verbose=1)(
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 1048, in __call__
    if self.dispatch_one_batch(iterator):
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 864, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 782, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 263, in __call__
    return [func(*args, **kwargs)
  File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 263, in <listcomp>
    return [func(*args, **kwargs)
  File "/home/grade/src/acezero/ace_zero_util.py", line 236, in map_seed
    run_cmd(mapping_cmd, verbose=verbose)
  File "/home/grade/src/acezero/ace_zero_util.py", line 49, in run_cmd
    raise RuntimeError("Error running ACE0: \nCommand:\n" + " ".join(cmd_str))
RuntimeError: Error running ACE0:
Command:
./train_ace.py data/impcanart/*.jpg result_folder/iteration0_seed0.pt --repro_loss_type tanh --render_target_path result_folder/renderings --render_marker_size 0.03 --refinement_ortho gram-schmidt --ace_pose_file_conf_threshold 500 --render_flipped_portrait False --pose_refinement_wait 0 --image_resolution 480 --pose_refinement_lr 0.001 --num_head_blocks 1 --repro_loss_hard_clamp 1000 --repro_loss_soft_clamp 50 --iterations_output 500 --max_dataset_passes 10 --learning_rate_schedule 1cyclepoly --learning_rate_max 0.003 --learning_rate_cooldown_iterations 5000 --learning_rate_cooldown_trigger_percent_threshold 0.7 --aug_rotation 15 --training_buffer_cpu False --num_data_workers 12 --render_visualization False --use_pose_seed 0.36983939211266215 --iterations 10000 --use_heuristic_focal_length True

</details>
gradeeterna commented 3 months ago

@kk6398 Hi, I couldn't tell you exactly what fixed it, but I had only previously installed cuda-toolkit-11-8 and build-essential in WSL2 for nerfstudio, gsplat etc, rather than full CUDA install.

I installed a bunch of other CUDA packages which has got it mostly working.

sudo apt-get -y install cuda-cudart-11-8 \
                      cuda-compiler-11-8 \
                      libcublas-11-8 \
                      libcufft-11-8 \
                      libcurand-11-8 \
                      libcusolver-11-8 \
                      libcusparse-11-8
kk6398 commented 3 months ago

@kk6398 Hi, I couldn't tell you exactly what fixed it, but I had only previously installed cuda-toolkit-11-8 and build-essential in WSL2 for nerfstudio, gsplat etc, rather than full CUDA install.

I installed a bunch of other CUDA packages which has got it mostly working.

sudo apt-get -y install cuda-cudart-11-8 \
                      cuda-compiler-11-8 \
                      libcublas-11-8 \
                      libcufft-11-8 \
                      libcurand-11-8 \
                      libcusolver-11-8 \
                      libcusparse-11-8

thanku, I try change the version of cuda from 12.1 to 11.8, and "conda env create -f environment.yml" again. And I try the "python train_ace.py "data/impcanart/*.jpg" test.pt --use_pose_seed 0 --use_heuristic_focal_length True", then met the error " RuntimeError: File test.pt cannot be opened." So, do you know where is the test.pt?

ebrach commented 3 months ago

Hi @kk6398!

That error looks weird. test.pt is the output file of train_ace.py rather than something that should already exist. Can you share the full stack trace of the error you get, please?

Best, Eric

ebrach commented 3 months ago

Could it be that your user does not have permissions to write files in the execution directory?