Closed AITech-D closed 4 years ago
In addition, for error output,the loss is less than normal output. Epoch: [1/200][7/5038] : Loss total 42.9904 Epoch: [1/200][473/5038] : Loss total 2.5231
What's your training dataset? Seems like the training/val lists are not identical as ours.
My dataset is the FlyingThings3D . It is identical as your dataset. I did not change any code but at data/hd3data.py. showed below:
def read_gen(file_name, mode): ext = splitext(file_name)[-1] if mode == 'image': assert ext in ['.png', '.jpeg', '.ppm', '.jpg']
data = Image.open(file_name)
data = data.convert('RGB')
return data
# return Image.open(file_name)
# ======Here changed!
elif mode == 'flow':
assert ext in ['.flo', '.png', '.pfm']
return fl.read_flow(file_name)
elif mode == 'stereo':
assert ext in ['.png', '.pfm']
return fl.read_disp(file_name)
else:
raise ValueError('Unknown mode {}'.format(mode))
I used the same dataset as your pretained file hd3sc_things-57947496.pth. Can you help me, sir?
And I start from scratch train hd3 model on FlyingThings3D, Error is same.The loss dropped quickly and the accuracy did not improve,output is all zereo. train.sh as below: CUDA_VISIBLE_DEVICES=3 python -u train.py --dataset_name=FlyingThings3D --train_root=/home/share/34916/SceneFlowDataset/FlyingThings3D --train_list=lists/FlyingThings3D_trainstereo.txt --val_root=/home/share/34916/SceneFlowDataset/FlyingThings3D --val_list=lists/FlyingThings3D_teststereo.txt --task=stereo --base_lr=0.0002 --encoder=dlaup --decoder=hda --context --workers=4 --epochs=200 --batch_size=4 --evaluate --batch_size_val=1 --pretrain_base=./outputs/model/model_zoo/dla34-ba72cf86.pth --visual_freq=20 --save_step=5 --save_path=./outputs/model
train list: frames_finalpass/TRAIN/B/0573/left/0006.png frames_finalpass/TRAIN/B/0573/right/0006.png disparity/TRAIN/B/0573/left/0006.pfm frames_finalpass/TRAIN/B/0573/left/0007.png frames_finalpass/TRAIN/B/0573/right/0007.png disparity/TRAIN/B/0573/left/0007.pfm frames_finalpass/TRAIN/B/0573/left/0008.png frames_finalpass/TRAIN/B/0573/right/0008.png disparity/TRAIN/B/0573/left/0008.pfm frames_finalpass/TRAIN/B/0573/left/0009.png frames_finalpass/TRAIN/B/0573/right/0009.png disparity/TRAIN/B/0573/left/0009.pfm frames_finalpass/TRAIN/B/0573/left/0010.png frames_finalpass/TRAIN/B/0573/right/0010.png disparity/TRAIN/B/0573/left/0010.pfm frames_finalpass/TRAIN/B/0573/left/0011.png frames_finalpass/TRAIN/B/0573/right/0011.png disparity/TRAIN/B/0573/left/0011.pfm frames_finalpass/TRAIN/B/0573/left/0012.png frames_finalpass/TRAIN/B/0573/right/0012.png disparity/TRAIN/B/0573/left/0012.pfm frames_finalpass/TRAIN/B/0573/left/0013.png frames_finalpass/TRAIN/B/0573/right/0013.png disparity/TRAIN/B/0573/left/0013.pfm frames_finalpass/TRAIN/B/0573/left/0014.png frames_finalpass/TRAIN/B/0573/right/0014.png disparity/TRAIN/B/0573/left/0014.pfm frames_finalpass/TRAIN/B/0299/left/0006.png frames_finalpass/TRAIN/B/0299/right/0006.png disparity/TRAIN/B/0299/left/0006.pfm frames_finalpass/TRAIN/B/0299/left/0007.png frames_finalpass/TRAIN/B/0299/right/0007.png disparity/TRAIN/B/0299/left/0007.pfm 。。。。。。 。。。。。。 frames_finalpass/TRAIN/B/0299/left/0008.png frames_finalpass/TRAIN/B/0299/right/0008.png disparity/TRAIN/B/0299/left/0008.pfm frames_finalpass/TRAIN/B/0299/left/0009.png frames_finalpass/TRAIN/B/0299/right/0009.png disparity/TRAIN/B/0299/left/0009.pfm frames_finalpass/TRAIN/B/0299/left/0010.png frames_finalpass/TRAIN/B/0299/right/0010.png disparity/TRAIN/B/0299/left/0010.pfm frames_finalpass/TRAIN/B/0299/left/0011.png frames_finalpass/TRAIN/B/0299/right/0011.png disparity/TRAIN/B/0299/left/0011.pfm frames_finalpass/TRAIN/B/0299/left/0012.png frames_finalpass/TRAIN/B/0299/right/0012.png disparity/TRAIN/B/0299/left/0012.pfm
test list: frames_finalpass/TEST/B/0040/left/0006.png frames_finalpass/TEST/B/0040/right/0006.png disparity/TEST/B/0040/left/0006.pfm frames_finalpass/TEST/B/0040/left/0007.png frames_finalpass/TEST/B/0040/right/0007.png disparity/TEST/B/0040/left/0007.pfm frames_finalpass/TEST/B/0040/left/0008.png frames_finalpass/TEST/B/0040/right/0008.png disparity/TEST/B/0040/left/0008.pfm frames_finalpass/TEST/B/0040/left/0009.png frames_finalpass/TEST/B/0040/right/0009.png disparity/TEST/B/0040/left/0009.pfm frames_finalpass/TEST/B/0040/left/0010.png frames_finalpass/TEST/B/0040/right/0010.png disparity/TEST/B/0040/left/0010.pfm frames_finalpass/TEST/B/0040/left/0011.png frames_finalpass/TEST/B/0040/right/0011.png disparity/TEST/B/0040/left/0011.pfm frames_finalpass/TEST/B/0040/left/0012.png frames_finalpass/TEST/B/0040/right/0012.png disparity/TEST/B/0040/left/0012.pfm 。。。。。。 。。。。。。 frames_finalpass/TEST/B/0040/left/0013.png frames_finalpass/TEST/B/0040/right/0013.png disparity/TEST/B/0040/left/0013.pfm frames_finalpass/TEST/B/0040/left/0014.png frames_finalpass/TEST/B/0040/right/0014.png disparity/TEST/B/0040/left/0014.pfm frames_finalpass/TEST/B/0133/left/0006.png frames_finalpass/TEST/B/0133/right/0006.png disparity/TEST/B/0133/left/0006.pfm
The operation D2V and V2D is opposite? I understand the V2D operation code in models/hd3_ops.py named vector2density(vect, c, dim). but don't understand D2V operation code named density2vector(prob, dim, normalize=True).
For the ERROR I got , I guess it is something wrong with the loss function. Because I cheched my input and the hd3 model. but I have no idea what is wrong with the loss. if U can help me something. Very Thanks,
Seems like you are not using the training/validation lists we provided. And the dataset structure is not the same as the official FlyingThings3D subset. I'm not sure why you added "Image.convert(RGB)" in your code as our dataloader works perfectly with FlyingThings3D subset already. Possibly the annotations you loaded are all zeros. The original FlyingThings3D dataset is problematic for the rendering is imperfect. Please do redownload the subset on the official webpage.
As for the D2V and V2D operations, you can refer to our paper for their principal.
Thank U. There is something wrong in the dataloader. And I have solved it.
Hi,@AITech-D,I have the same problem as you. How did you solve it ? very thanks!
Hi, Thank you for sharing your code and the pre-trained files! I was trying to re-train the network with the pre-trained file on FlyingThings3D for stereo.
(hd3sc_things-57947496.pth trained on FlyingThings3D only)
I run the train file: train.sh as CUDA_VISIBLE_DEVICES=3 python -u train.py \ --dataset_name=FlyingThings3D \ --train_root=/home/share/34916/SceneFlowDataset/FlyingThings3D \ --train_list=lists/FlyingThings3D_trainstereo.txt \ --val_root=/home/share/34916/SceneFlowDataset/FlyingThings3D \ --val_list=lists/FlyingThings3D_teststereo.txt \ --task=stereo \ --base_lr=0.0002 \ --encoder=dlaup \ --decoder=hda \ --context \ --workers=4 \ --epochs=200 \ --batch_size=4 \ --evaluate \ --batch_size_val=1 \ --pretrain=./outputs/model/model_zoo/hd3sc_things-57947496.pth \ --visual_freq=20 \ --save_step=50 \ --save_path=./outputs/model
but but but I got Error output。The output is all close to zero. I print the intermediate tensor in hd3/models/hd3net.py . code as follow: decoder = getattr(self, 'Decoder_' + str(l)) prob_map, up_feat = decoder(decoder_input)
curr_vect = density2vector(prob_map, self.dim, True)
For previous steps the mean of curr_vect is normal. Show below: [2019-11-13 02:14:00,322 INFO train.py line 259 121140] Loss total 42.9904 curr_vect mean: tensor(-1.0633, device='cuda:0', grad_fn=)
++++++++++++++++++++++
curr_vect mean: tensor(-2.1657, device='cuda:0', grad_fn=)
++++++++++++++++++++++
curr_vect mean: tensor(-4.5257, device='cuda:0', grad_fn=)
++++++++++++++++++++++
curr_vect mean: tensor(-9.1923, device='cuda:0', grad_fn=)
++++++++++++++++++++++
curr_vect mean: tensor(-18.4706, device='cuda:0', grad_fn=)
++++++++++++++++++++++
curr_vect mean: tensor(-36.9369, device='cuda:0', grad_fn=)
++++++++++++++++++++++
[2019-11-13 02:14:01,065 INFO train.py line 256 121140] Epoch: [1/200][7/5038] Data 0.001 (0.145) Batch 0.743 (3.269) Remain 914:55:07.
But after about a few hundred steps,the mean of curr_vect is almost all zero. Show below: [2019-11-13 01:54:30,441 INFO train.py line 259 117631] Loss total 2.5231 curr_vect mean: tensor(-0.4105, device='cuda:0', grad_fn=)
++++++++++++++++++++++
curr_vect mean: tensor(-0.0049, device='cuda:0', grad_fn=)
++++++++++++++++++++++
curr_vect mean: tensor(-0.0027, device='cuda:0', grad_fn=)
++++++++++++++++++++++
curr_vect mean: tensor(-0.0019, device='cuda:0', grad_fn=)
++++++++++++++++++++++
curr_vect mean: tensor(-0.0009, device='cuda:0', grad_fn=)
++++++++++++++++++++++
curr_vect mean: tensor(-0.0006, device='cuda:0', grad_fn=)
++++++++++++++++++++++
[2019-11-13 01:54:31,190 INFO train.py line 256 117631] Epoch: [1/200][473/5038] Data 0.001 (0.003) Batch 0.750 (0.788) Remain 220:32:07.
I am very confused. I train on hd3sc_things-57947496.pth, the output was not better, but got worse after hundreds steps.