Closed 756537479 closed 5 years ago
Hi,
The default setting uses data from all cameras to train (--subtask ''), so you don't need to specify the subtask option. You only need to specify it to each camera during test.
You can also only use the camera1 data for train/test. But please firstly check the validation loss (it should be consistent with the training loss). The images for training and validation are masked by the ROI region. When testing, it uses the original images (without ROI masking), thus the test loss should be high. The reason of not using ROI masked images for test is that we do not care about the reconstruction loss for test, but only care about the bounding boxes accuracy (and for better visualization, as shown in the paper). However, you can also create an ROI masked test set, in which case the loss will be consistent with the trainng loss.
thanks
For camera1, the ROI masked test set is under data/duke/pt/camera1/metric/input
, and the test set with original images is under data/duke/pt/camera1/metric/org
.
You can do it by changing line 159 of run.py
to X_org_seq = torch.load(path.join(data_dir, 'input', filename))
.
I run the camera1 test and get result is : recon: 1.000, tight: 0.000, entr: 0.000 Validation 776 / 776, loss = 4607.255 Final validation loss: 4558.872
camera1 trained result is about 100,and nobody is recognized
If you only train on camera1, before testing, I suggest you to check the trainng/validation curve by using scripts/show_curve
.
To see the validation loss, you also need to change the training ratio in line 33 of scripts/gen_duke.py
, i.e., by setting train_ratio = 0 if arg.metric == 1 else 0.96
---the original training ratio is 1, which means no validation set is created.
If you find both the training/validation curve is ok (e.g., under 60), then you could try testing.
In my experiments, I have used all data, and the final training loss is about 40.
Moreover, I'm not sure what the result would be if you only use camera1 data, since it might overfit the model with less data.
ok, thanks, I got it. my train loss is floating around 100.
I use all duke camera data to train, and use default config. Loss is strat 40-60 to change. Is this nomal? I drow the Validation Loss/Epoch picture find Loss shocked in 40-70 , and every epoch curve fluctuations are the same,loss is no downward trend.
It might be caused by the vanishing gradients during backpropagation. We have encountered this problem before, but not very frequently. In the future we'll release a stabler version for training duke, as currently I do not have enough GPUs to run the code so it might require some time :(
I have updated the loss module and you might have a try on it. The new loss uses an additional reconstruction term to make training easier.
I try new code , start loss is raised. But training Loss also not drop, Is there any other way? thank you
How many iterations have you trained the model? Typically the loss starts to drop after 10000 -- 15000 iterations.
ok, It alreay ran 3000.
You can watch the gradient of C_o_seq (grad_C_o_seq
in the command window) during training. If it's zero, then the gradient is vanished. In normal case, it should be around 10^-6 to 10^-4.
1.00000e-02 * 4.4214 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 [torch.cuda.FloatTensor of size 1x10 (GPU 0)]
It seems to be out of scope.
I use camera1 data of Duke_MTMCT to train and test, but result train loss is 105, test loss is 4500. Is there any error with my operation? or, i need run all cameras to get true result.