Closed XJTU-Haolin closed 16 hours ago
Hello, sorry for the late response. Is this problem solved? I couldn't replicate it.
It seems that the problem is on the validation data, and not on the training. Have you verified that the path to the validation images and intrinsics is correct?
Hello, sorry for the late response. Is this problem solved? I couldn't replicate it.
It seems that the problem is on the validation data, and not on the training. Have you verified that the path to the validation images and intrinsics is correct?
I will check it again. Thanks for your reply!
Closing this error since it has not been active for a while. Do please reopen if you find any other problems. Thanks!
When I ran multi-gpu training of Mikey using 4*3090, I met the following errors. I never meet such problems when using one GPU. It seems that something wrong with the JPEG images, but the map-free datasets were downloaded without any processing.
./train.sh: line 1: 23 Killed python3 train.py [rank: 3] Child process with PID 27 terminated with code -9. Forcefully terminating all other processes to avoid zombies 🧟 RuntimeError: DataLoader worker (pid 2655) is killed by signal: Killed. _error_if_any_worker_fails() File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler transform = torch.eye(3) File "/opt/data/private/zhanghaolin_project/local_feature/mickey-main/lib/datasets/utils.py", line 92, in correct_intrinsic_scale K = correct_intrinsic_scale(K, resize[0] / W, resize[1] / H) File "/opt/data/private/zhanghaolin_project/local_feature/mickey-main/lib/datasets/mapfree.py", line 47, in read_intrinsics self.K, self.K_ori = self.read_intrinsics(self.scene_root, resize) File "/opt/data/private/zhanghaolin_project/local_feature/mickey-main/lib/datasets/mapfree.py", line 26, in init MapFreeScene( File "/opt/data/private/zhanghaolin_project/local_feature/mickey-main/lib/datasets/mapfree.py", line 191, in
data_srcs = [
File "/opt/data/private/zhanghaolin_project/local_feature/mickey-main/lib/datasets/mapfree.py", line 190, in init
dataset = self.dataset_type(self.cfg, 'val')
File "/opt/data/private/zhanghaolin_project/local_feature/mickey-main/lib/datasets/datamodules.py", line 107, in val_dataloader
return fn(*args, kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/call.py", line 179, in _call_lightning_datamodule_hook
return call._call_lightning_datamodule_hook(self.instance.trainer, self.name)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 309, in dataloader
return data_source.dataloader()
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 342, in _request_dataloader
dataloaders = _request_dataloader(source)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/evaluation_loop.py", line 166, in setup_data
self.epoch_loop.val_loop.setup_data()
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/fit_loop.py", line 324, in on_run_start
self.on_run_start()
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/fit_loop.py", line 201, in run
self.fit_loop.run()
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1033, in _run_stage
results = self._run_stage()
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 987, in _run
self._run(model, ckpt_path=ckpt_path)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
return function(*args, *kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch
return trainer.strategy.launcher.launch(trainer_fn, args, trainer=trainer, kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
call._call_and_handle_interrupt(
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
trainer.fit(model, datamodule_end, ckpt_path=ckpt_path)
File "/opt/data/private/zhanghaolin_project/local_feature/mickey-main/train.py", line 89, in train_model
train_model(args)
File "/opt/data/private/zhanghaolin_project/local_feature/mickey-main/train.py", line 99, in
Traceback (most recent call last):
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Training with 0.00/1.00 image overlap
Could you give me any instructions?
Thanks for your time!
Haolin