Open shqmffl486 opened 1 year ago
Hi,
I created this split only for students to conduct experiments under limited computing resources. I did not do experiments using the sdf_hand_mini and sdf_obj_mini. To use this split, you need to generate a new json file like this and use it in your config file (https://github.com/zerchen/AlignSDF/blob/master/experiments/obman/30k_1e2d_mlp5.json). Hope it helps.
Thank you for your reply. I learned it beforehand and trained it, but there seems to be an error in the process of making the last mesh. What do you think is the problem? So the Eval_obman file and several files were created in it, but the contents were missing
DeepSdf - INFO - time used: 85.93944382667542
DeepSdf - INFO - save at 100
DeepSdf - INFO - Distributing BatchNorm running means and vars
Traceback (most recent call last):
File "train.py", line 715, in
I think this line is the problem. mesh.py 157, out_labels[head: min(head + max_batch, num_out_vertices)] = predicted_class.argmax(dim=1).detach().cpu()
How do I train with sdf_hand_mini and sdf_obj_mini that you uploaded? I think there is a .npz file that doesn't exist because I put it in mini version.
(alignsdf) MS-7B23:~/mount4t/AlignSDF$ CUDA_VISIBLE_DEVICES=0 bash dist_train.sh 4 6666 -e experiments/obman/30k_1e2d_mlp5.json do not support renderer in this machine DeepSdf - INFO - Added key: store_based_barrier_key:1 to store for rank: 0 DeepSdf - INFO - Training in distributed mode, 1 GPU per process. Process 0, total 1. DeepSdf - INFO - Experiment description: 3D hand reconstruction on the mini obman dataset. Hand branch: True Object branch: True Mano branch: False Depth branch: False Classifier Weight: 0 Penetration Loss: False Penetration Loss Weight: 0 Additional Loss start at epoch: 1201 Contact Loss: False Contact Loss Weight: 0 Contact Loss Sigma (m): 0.005 Independent Obj Scale: False Ignore other: False nb_label_class: 6 Image encoder, the branch has latent size 256 DeepSdf - INFO - Finish constructing the dataset DeepSdf - INFO - start_epoch:1, current_rank:0 DeepSdf - INFO - epoch:1, current_rank:0 Traceback (most recent call last): File "train.py", line 715, in
main_function(exp_cfg, args.continue_from, args.local_rank, args.opt_level, args.slurm)
File "train.py", line 465, in main_function
for i, (input_iter, label_iter, meta_iter) in enumerate(sdf_loader):
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/gaeun/mount4t/AlignSDF/utils/data.py", line 162, in getitem
hand_samples, hand_labels = unpack_sdf_samples(self.data_source, data_key, num_sample, hand=True, clamp=self.clamp, filter_dist=self.filter_dist)
File "/home/gaeun/mount4t/AlignSDF/utils/sdf_utils.py", line 172, in unpack_sdf_samples
npz = np.load(npz_path)
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/numpy/lib/npyio.py", line 405, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'data/obman/train/sdf_hand/00018168.npz'
Killing subprocess 12576 Traceback (most recent call last): File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in
main()
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/gaeun/anaconda3/envs/alignsdf/bin/python', '-u', 'train.py', '--local_rank=0', '-e', 'experiments/obman/30k_1e2d_mlp5.json']' returned non-zero exit status 1.