Closed y2sman closed 2 years ago
Hi ys2man, thank you for letting us know your concern.
Hi ys2man, thank you for letting us know your concern.
- losing ~80 videos in the training set shouldn't be a problem for the performance, compared to the big size of the VCDB dataset.
- As mentioned in the paper, we sample 997,090 frames from the VCDB dataset, i.e. 10 frames per video, so this is correct.
- The dropout rate is not crucial, please follow the paper.
- To better locate the problem, may I ask why the cosine similarity isn't working, how the first table is obtained (the performance seems good), and what's the difference between evaluation.py and evaluation_org.py?
Thanks for reply. Before i start, it is good to hear that first table's performance is good.
The difference between evaluation.py and evaluation_org.py is not quite big. I just write my own cosine similiarity code for evaluation, because the orginial code wasn't worked.
I attached my error message here with evaluation.py to calc cosine similarity.
python3 evaluation.py --dataset FIVR-5K --pca_components 1024 --num_clusters 256 --num_layers 1 --output_dim 1024 --padding_size 64 --metric cosine --model_path models/model_v5_with_all_bg.pth --feature_path pre_processing/fivr_imac_pca1024.hdf5 --random_sampling
Comparator is ... False
loading features...
...features loaded
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 156.28it/s]
0%| | 0/5000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "evaluation.py", line 331, in <module>
main()
File "evaluation.py", line 327, in main
eval_function(model, dataset, args)
File "evaluation.py", line 258, in query_vs_database
sims = calculate_similarities(queries, embedding, qr_video_dict, args.metric, comparator)
File "evaluation.py", line 50, in calculate_similarities
cdist(query_features, target_feature, metric='cosine'))
File "/usr/local/envs/etri/lib/python3.7/site-packages/scipy/spatial/distance.py", line 2717, in cdist
raise ValueError('XA must be a 2-dimensional array.')
ValueError: XA must be a 2-dimensional array.
Other methods(euclidian, chamfer, sym_chamfer) are worked perfect. I added visil's pre_trained weight for video_comparator but it stopped while calculating.
python3 evaluation.py --dataset FIVR-5K --pca_components 1024 --num_clusters 256 --num_layers 1 --output_dim 1024 --padding_size 64 --metric chamfer --model_path models/model_v5_with_all_bg.pth --feature_path pre_processing/fivr_imac_pca1024.hdf5 --random_sampling --use_comparator
Comparator is ... True
loading features...
...features loaded
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 505.95it/s]
48%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 2383/5000 [01:45<01:56, 22.49it/s]
Traceback (most recent call last):
File "evaluation.py", line 331, in <module>
main()
File "evaluation.py", line 327, in main
eval_function(model, dataset, args)
File "evaluation.py", line 258, in query_vs_database
sims = calculate_similarities(queries, embedding, qr_video_dict, args.metric, comparator)
File "evaluation.py", line 56, in calculate_similarities
sim = chamfer(query, target_feature, comparator)
File "evaluation.py", line 71, in chamfer
simmatrix = comparator(simmatrix).detach()
File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/kjlee/workspace/temporal_context_aggregation/model.py", line 620, in forward
sim = self.mpool2(sim)
File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/modules/pooling.py", line 164, in forward
self.return_indices)
File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/_jit_internal.py", line 405, in fn
return if_false(*args, **kwargs)
File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/functional.py", line 718, in _max_pool2d
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: Given input size: (64x22x1). Calculated output size: (64x11x0). Output size is too small
The two problems I sent above are the problems I have now. I think it is structured in accordance with what is in the paper, and as I said above, other parts are parts that do not affect performance. In this situation, can you provide "pre_trained_model" or provide accurate parameter values? And please check whether the cosine similarity calculation code works.
Hi @y2sman,
Just noticed that you are trying to align with our results on FIVR-5K reported in the ablation study section. However, for one thing, this subset is somehow too small and may produce unstable results; for another, as that table is only for ablation study, we only ensure ablating one hyper-parameter per subtable, and not all hyper-parameters are perfectly aligned with our final run on FIVR-200K (so it is ok to have different results with the ablation study section). So I recommend trying to experiment with FIVR-200K or running multiple times with FIVR-5K for stable results.
About your questions, I just use scipy to calculate the cosine similarities, please check their document for the error message, it seems that your tensor shape is not suitable, and for the ViSiL video comparator, please note that they require each video to have at least 4 frames, your error message may indicate a too-short video. Recently they released their official PyTorch code, which may be helpful for you: https://github.com/MKLab-ITI/visil/tree/pytorch
BTW, I wonder if the problem is only with the cosine similarity metric, are other metrics fine?
Hi ys2man, thank you for letting us know your concern.
- losing ~80 videos in the training set shouldn't be a problem for the performance, compared to the big size of the VCDB dataset.
- As mentioned in the paper, we sample 997,090 frames from the VCDB dataset, i.e. 10 frames per video, so this is correct.
- The dropout rate is not crucial, please follow the paper.
- To better locate the problem, may I ask why the cosine similarity isn't working, how the first table is obtained (the performance seems good), and what's the difference between evaluation.py and evaluation_org.py?
Thanks for reply. Before i start, it is good to hear that first table's performance is good.
The difference between evaluation.py and evaluation_org.py is not quite big. I just write my own cosine similiarity code for evaluation, because the orginial code wasn't worked.
I attached my error message here with evaluation.py to calc cosine similarity.
python3 evaluation.py --dataset FIVR-5K --pca_components 1024 --num_clusters 256 --num_layers 1 --output_dim 1024 --padding_size 64 --metric cosine --model_path models/model_v5_with_all_bg.pth --feature_path pre_processing/fivr_imac_pca1024.hdf5 --random_sampling Comparator is ... False loading features... ...features loaded 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 156.28it/s] 0%| | 0/5000 [00:00<?, ?it/s] Traceback (most recent call last): File "evaluation.py", line 331, in <module> main() File "evaluation.py", line 327, in main eval_function(model, dataset, args) File "evaluation.py", line 258, in query_vs_database sims = calculate_similarities(queries, embedding, qr_video_dict, args.metric, comparator) File "evaluation.py", line 50, in calculate_similarities cdist(query_features, target_feature, metric='cosine')) File "/usr/local/envs/etri/lib/python3.7/site-packages/scipy/spatial/distance.py", line 2717, in cdist raise ValueError('XA must be a 2-dimensional array.') ValueError: XA must be a 2-dimensional array.
Other methods(euclidian, chamfer, sym_chamfer) are worked perfect. I added visil's pre_trained weight for video_comparator but it stopped while calculating.
python3 evaluation.py --dataset FIVR-5K --pca_components 1024 --num_clusters 256 --num_layers 1 --output_dim 1024 --padding_size 64 --metric chamfer --model_path models/model_v5_with_all_bg.pth --feature_path pre_processing/fivr_imac_pca1024.hdf5 --random_sampling --use_comparator Comparator is ... True loading features... ...features loaded 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 505.95it/s] 48%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 2383/5000 [01:45<01:56, 22.49it/s] Traceback (most recent call last): File "evaluation.py", line 331, in <module> main() File "evaluation.py", line 327, in main eval_function(model, dataset, args) File "evaluation.py", line 258, in query_vs_database sims = calculate_similarities(queries, embedding, qr_video_dict, args.metric, comparator) File "evaluation.py", line 56, in calculate_similarities sim = chamfer(query, target_feature, comparator) File "evaluation.py", line 71, in chamfer simmatrix = comparator(simmatrix).detach() File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/kjlee/workspace/temporal_context_aggregation/model.py", line 620, in forward sim = self.mpool2(sim) File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/modules/pooling.py", line 164, in forward self.return_indices) File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/_jit_internal.py", line 405, in fn return if_false(*args, **kwargs) File "/usr/local/envs/etri/lib/python3.7/site-packages/torch/nn/functional.py", line 718, in _max_pool2d return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) RuntimeError: Given input size: (64x22x1). Calculated output size: (64x11x0). Output size is too small
The two problems I sent above are the problems I have now. I think it is structured in accordance with what is in the paper, and as I said above, other parts are parts that do not affect performance. In this situation, can you provide "pre_trained_model" or provide accurate parameter values? And please check whether the cosine similarity calculation code works.
I think you have evaluated the frame-level feature with cosine similarity, which is for video-level feature according to 4.2 similarity measure of the paper.
Hello. I've been trying to restore paper's score. However... i failed to achieve the same metric on your paper.
Here is the diffenerence between TCA and my tries.
I'm leaving the result of measuring the performance based on the thesis information.
I really wanted to get the same metric on your paper. Please let me know which one is different to your work. Thanks a lot.