Closed name333 closed 6 years ago
The trainB.txt depth maps are not used for the loss function. It is used for a visual goal. So we can watch the difference of predicted and ground truth depth map during the training in real dataset. If you don't want watch the ground truth, you can also delete this file during the training.
Thanks for you immediately reply, when I'm running train.py, it returns the error as follows. It seems that the function enumerate cannot read the dataset, and I found the dataset is not a type of List, maybe this is the reason why I met this error. Did you have ever met this problem? Can you help me deal with it?
Traceback (most recent call last):
File "train.py", line 22, in
Hi @name333 , Before running the code, did you ensure that the files are physically present as well, on your hard-drive I mean? and additionally, have exactly the same names as mentioned in the trainA/B.txt files?
Hi @aasharma90 , I got the same error as @name333 .
For the dataset input, what I do is generate a own_vkitti_train_mul64.txt own_vkitti_train_depth_mul64.txt for the trainA_SYN80.txt and trainA_SYN80.txt
I just translate the trainA_SYN80.txt (I just change the "03030079_A.png" into 0001/15-deg-left/00076.png, BTW I think we need to change 0079 to 00076.png due to the range of index) Then I delete some data to fit the input number as multiple of 64
On the other hand, for the target domain, I generate the "trainA.txt" and "trainB.txt" as own_rkitti_depth.txt own_rkitti_pic.txt I just generate the txt by traversing all real KITTI GT depth data. Then I delete some data to fit the input number as multiple of 64
After generating the input, I run the training as you posted in github. However, I got
Traceback (most recent call last): File "train.py", line 22, in <module> for i, data in enumerate(dataset): File "/home/yipengm/anaconda3/envs/ptcu8/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 280, in __next__ idx, batch = self._get_batch() File "/home/yipengm/anaconda3/envs/ptcu8/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 259, in _get_batch return self.data_queue.get() File "/home/yipengm/anaconda3/envs/ptcu8/lib/python3.6/multiprocessing/queues.py", line 335, in get res = self._reader.recv_bytes() File "/home/yipengm/anaconda3/envs/ptcu8/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/home/yipengm/anaconda3/envs/ptcu8/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/home/yipengm/anaconda3/envs/ptcu8/lib/python3.6/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) File "/home/yipengm/anaconda3/envs/ptcu8/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 178, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 17078) is killed by signal: Floating point exception.
I tried to shrink the batchsize and nThreads to 1 and didn't solve the problem. When I change nThreads to 0, I got
Floating point exception (core dumped)
My GPU is titan xp with 12 GB VRAM and runing in pytorch 0.4.0, cuda 8.0. So, do I need to change the directory structure exactly like your setup? Can I have the trainA.txt and trainB.txt so I can figure out where the problem lies? Thank you very much.
Hi @yipengm
I suggested the following things to be ensured to @name333 -
1) The name of the files that you have in the train*.txt files, are actually present on your physical hard-drive.
2) The floating point error might come if the images could not be properly loaded. For this, please analyse the variables (img_source/target, lab_source/target, etc.)inside the __getitem__
function in the data_loader.py
file.
3) Additionally, ensure that the depth maps that you are loading are clipped to 80m (as suggested by the author in his paper). For this, before starting your training, run a separate script to load all the depth maps, clip all of them to 80m, and then save them. If you want, you can follow the attached Matlab script that I created for this process. (I suspect that this could be the problem, as for instance the sky regions in the original depth maps take the max. possible value of ~2^16, which can cause the floating point exception while loading)
I think @name333 faced the third issue, and you might be facing the same. I've posted below the above mentioned Matlab code (remove the .txt extension) for your assistance. Hope this solves your problem. clipMaxDepth.m.txt
@aasharma90 Thank you very much for the immediate reply. You saved my life. I believe the third one must be the problem and I going to try it.
Hi, @aasharma90 could you explain why you divide the original depth by 256 first? Besides, the original depth unit in VKITTI is centimeter, should we clip it to 8000 for 80m?
Hi @GuanWenlong ,
I think you are right. Perhaps we are not needed to. I think we can simply clip it to 8000 for 80m as you mentioned. Anyway, it is just a scaling factor I think, and in my experiments I couldn't find any problem with the task network learning those perhaps incorrectly scaled values. However, I would like to point out that I skipped the validation checks, so maybe this is why it could work. If you perform validation, then you could face a problem with such incorrectly scaled values as the validation GT and these values would not be compatible with each other.
Thank you for pointing out my mistake! Kindly let us know if you could make the things run with the correct clipping method, and we can then accept it as the correct solution.
Best regards, Aashish
excuse me Could you please answer the following questions? --img_source_file /dataset/Image2Depth31_KITTI/trainA_SYN80.txt --img_target_file /dataset/Image2Depth31_KITTI/trainA.txt --lab_source_file /dataset/Image2Depth31_KITTI/trainB_SYN80.txt --lab_target_file /dataset/Image2Depth31_KITTI/trainB.txt
This four lines of code, respectively, refers to the vkitti composite map.The original image of kitti, and the corresponding depth map? How do you get the depth map corresponding to kitti?