Closed gradeeterna closed 3 months ago
Hi!
Can you please try to run ACE0 with --seed_parallel_workers 1
and report back here? The actual error message might have been swallowed by the parallel processing when mapping the seed images.
Best, Eric
Hey Eric,
The error messages aren't much different with --seed_parallel_workers 1 unfortunately.
Too bad. Let's check whether you can call the ACE mapper directly. Please run:
python train_ace.py "data/impcanart/*.jpg" test.pt --use_pose_seed 0
(This command will train a network test.pt
from the first image of your dataset. The ACE0 meta-script will call something very similar to start the reconstruction.)
That gives this error:
python train_ace.py "data/impcanart/*.jpg" test.pt --use_pose_seed 0
Traceback (most recent call last):
File "train_ace.py", line 237, in <module>
raise ValueError("Either use_heuristic_focal_length or use_external_focal_length "
ValueError: Either use_heuristic_focal_length or use_external_focal_length or use_ace_pose_file has to be set.
Oh, sorry. This is on me. Please try again:
python train_ace.py "data/impcanart/*.jpg" test.pt --use_pose_seed 0 --use_heuristic_focal_length True
(we have to tell ACE what to do wrt intrinsics)
Also a pretty vague error with that unfortunately!
Sorry, turned out to be an issue with my system CUDA setup. Just reinstalled some stuff and managed to get it working. Thanks for the help, and look forward to trying it out!
I met the same error as you, can you tell me the deal details please?
Hey Eric,
The error messages aren't much different with --seed_parallel_workers 1 unfortunately.
Error Log (click to expand)
python ace_zero.py "data/impcanart/*.jpg" result_folder --seed_parallel_workers 1 INFO:__main__:Starting reconstruction of files matching data/impcanart/*.jpg. INFO:__main__:Downloading ZoeDepth model from the main process. Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master Using cache found in /home/grade/.cache/torch/hub/isl-org_ZoeDepth_main img_size [384, 512] Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master /home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1678402421473/work/aten/src/ATen/native/TensorShape.cpp:3483.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] Params passed to Resize transform: width: 512 height: 384 resize_target: True keep_aspect_ratio: True ensure_multiple_of: 32 resize_method: minimal Using pretrained resource url::https://github.com/isl-org/ZoeDepth/releases/download/v1.0/ZoeD_M12_NK.pt Loaded successfully INFO:dataset_io:Loaded pretrained ZoeDepth model. INFO:__main__:Depth estimation model ready to use. INFO:__main__:Trying seeds: [0.36983939 0.28398129 0.75519018 0.17367966 0.46093806] [Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers. INFO:ace_zero_util:Processing seed 0: 0.36983939211266215 INFO:ace_trainer:Using device for training: cuda INFO:ace_trainer:ACE feature buffer device: cuda INFO:ace_trainer:Setting random seed to 2089 INFO:ace_trainer:Disabling multi-threaded data loading because we cannot run multiple depth inference passes simultaneously. INFO:dataset:Overwriting focal length with heuristic derived from image dimensions. INFO:dataset:Loading RGB files from: data/impcanart/*.jpg INFO:dataset:Overwriting dataset with single image: 132 - data/impcanart/impcanart1-right_0017.jpg INFO:dataset:Using ZoeDepth for depth initialization. Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master Using cache found in /home/grade/.cache/torch/hub/isl-org_ZoeDepth_main Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master /home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1678402421473/work/aten/src/ATen/native/TensorShape.cpp:3483.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] INFO:dataset_io:Loaded pretrained ZoeDepth model. INFO:ace_trainer:Loaded training scan from: data/impcanart/*.jpg -- 1 images, mean: 0.00 0.00 0.00 INFO:ace_trainer:Loaded pretrained encoder from: ace_encoder_pretrained.pt INFO:ace_network:Creating Regressor using pretrained encoder with 512 feature size. INFO:ace_trainer:Starting creation of the training buffer. Traceback (most recent call last): File "ace_zero.py", line 187, in <module> seed_reg_rates = Parallel(n_jobs=opt.seed_parallel_workers, verbose=1)( File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 1048, in __call__ if self.dispatch_one_batch(iterator): File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 864, in dispatch_one_batch self._dispatch(tasks) File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 782, in _dispatch job = self._backend.apply_async(batch, callback=cb) File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async result = ImmediateResult(func) File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 572, in __init__ self.results = batch() File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 263, in __call__ return [func(*args, **kwargs) File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 263, in <listcomp> return [func(*args, **kwargs) File "/home/grade/src/acezero/ace_zero_util.py", line 236, in map_seed run_cmd(mapping_cmd, verbose=verbose) File "/home/grade/src/acezero/ace_zero_util.py", line 49, in run_cmd raise RuntimeError("Error running ACE0: \nCommand:\n" + " ".join(cmd_str)) RuntimeError: Error running ACE0: Command: ./train_ace.py data/impcanart/*.jpg result_folder/iteration0_seed0.pt --repro_loss_type tanh --render_target_path result_folder/renderings --render_marker_size 0.03 --refinement_ortho gram-schmidt --ace_pose_file_conf_threshold 500 --render_flipped_portrait False --pose_refinement_wait 0 --image_resolution 480 --pose_refinement_lr 0.001 --num_head_blocks 1 --repro_loss_hard_clamp 1000 --repro_loss_soft_clamp 50 --iterations_output 500 --max_dataset_passes 10 --learning_rate_schedule 1cyclepoly --learning_rate_max 0.003 --learning_rate_cooldown_iterations 5000 --learning_rate_cooldown_trigger_percent_threshold 0.7 --aug_rotation 15 --training_buffer_cpu False --num_data_workers 12 --render_visualization False --use_pose_seed 0.36983939211266215 --iterations 10000 --use_heuristic_focal_length True </details>
@kk6398 Hi, I couldn't tell you exactly what fixed it, but I had only previously installed cuda-toolkit-11-8
and build-essential
in WSL2 for nerfstudio, gsplat etc, rather than full CUDA install.
I installed a bunch of other CUDA packages which has got it mostly working.
sudo apt-get -y install cuda-cudart-11-8 \
cuda-compiler-11-8 \
libcublas-11-8 \
libcufft-11-8 \
libcurand-11-8 \
libcusolver-11-8 \
libcusparse-11-8
@kk6398 Hi, I couldn't tell you exactly what fixed it, but I had only previously installed
cuda-toolkit-11-8
andbuild-essential
in WSL2 for nerfstudio, gsplat etc, rather than full CUDA install.I installed a bunch of other CUDA packages which has got it mostly working.
sudo apt-get -y install cuda-cudart-11-8 \ cuda-compiler-11-8 \ libcublas-11-8 \ libcufft-11-8 \ libcurand-11-8 \ libcusolver-11-8 \ libcusparse-11-8
thanku, I try change the version of cuda from 12.1 to 11.8, and "conda env create -f environment.yml" again. And I try the "python train_ace.py "data/impcanart/*.jpg" test.pt --use_pose_seed 0 --use_heuristic_focal_length True", then met the error " RuntimeError: File test.pt cannot be opened." So, do you know where is the test.pt?
Hi @kk6398!
That error looks weird. test.pt
is the output file of train_ace.py
rather than something that should already exist. Can you share the full stack trace of the error you get, please?
Best, Eric
Could it be that your user does not have permissions to write files in the execution directory?
Hi, I haven't been able to get this working in WSL2 with Ubuntu 22.04.
I get the following errors when I run this command with any image folder:
python ace_zero.py "/path/to/some/images/*.jpg" result_folder
I'm following the installation instructions, and everything seems to have installed correctly so not sure if this is a WSL2 issue or something else. Thanks!
Error Log (click to expand)
```text INFO:__main__:Starting reconstruction of files matching data/impcanart/*.jpg. INFO:__main__:Downloading ZoeDepth model from the main process. Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master Using cache found in /home/grade/.cache/torch/hub/isl-org_ZoeDepth_main img_size [384, 512] Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master /home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1678402421473/work/aten/src/ATen/native/TensorShape.cpp:3483.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] Params passed to Resize transform: width: 512 height: 384 resize_target: True keep_aspect_ratio: True ensure_multiple_of: 32 resize_method: minimal Using pretrained resource url::https://github.com/isl-org/ZoeDepth/releases/download/v1.0/ZoeD_M12_NK.pt Loaded successfully INFO:dataset_io:Loaded pretrained ZoeDepth model. INFO:__main__:Depth estimation model ready to use. INFO:__main__:Trying seeds: [0.36983939 0.28398129 0.75519018 0.17367966 0.46093806] INFO:__main__:Processing 5 seeds in parallel. [Parallel(n_jobs=3)]: Using backend LokyBackend with 3 concurrent workers. INFO:ace_trainer:Using device for training: cuda INFO:ace_trainer:ACE feature buffer device: cuda INFO:ace_trainer:Setting random seed to 2089 INFO:ace_trainer:Disabling multi-threaded data loading because we cannot run multiple depth inference passes simultaneously. INFO:dataset:Overwriting focal length with heuristic derived from image dimensions. INFO:dataset:Loading RGB files from: data/impcanart/*.jpg INFO:dataset:Overwriting dataset with single image: 132 - data/impcanart/impcanart1-right_0017.jpg INFO:dataset:Using ZoeDepth for depth initialization. Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master Using cache found in /home/grade/.cache/torch/hub/isl-org_ZoeDepth_main Using cache found in /home/grade/.cache/torch/hub/intel-isl_MiDaS_master /home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1678402421473/work/aten/src/ATen/native/TensorShape.cpp:3483.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] INFO:dataset_io:Loaded pretrained ZoeDepth model. INFO:ace_trainer:Loaded training scan from: data/impcanart/*.jpg -- 1 images, mean: 0.00 0.00 0.00 INFO:ace_trainer:Loaded pretrained encoder from: ace_encoder_pretrained.pt INFO:ace_network:Creating Regressor using pretrained encoder with 512 feature size. INFO:ace_trainer:Starting creation of the training buffer. joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 436, in _process_worker r = call_item() File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 288, in __call__ return self.fn(*self.args, **self.kwargs) File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 595, in __call__ return self.func(*args, **kwargs) File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 263, in __call__ return [func(*args, **kwargs) File "/home/grade/miniconda3/envs/ace0/lib/python3.8/site-packages/joblib/parallel.py", line 263, in