Closed zhangguangxun closed 4 years ago
Hi zhangguangxun,
Thanks for visiting my repo.
First, is dataset_dir means the dir of features extracted by another repo you proposed which are the pth files Second, is feature_dir means the parameters to initialize the neural network which means the model in your repo is retrained rather than initialize randomly?
Sorry for confusing you. My directory structure is like this:
dataset_dir/ ─── feature_dir/
├─ hdf5_dir/ (video dir)
└─ anno_file (.json)
dataset_dir
is the path to a directory that contains videos and features.
feature_dir
is the relative path from dataset_dir
.
Third, in the previous issue, you mentioned that this repo is changed from an image caption code, do you have the paper about that method?
I just referred to this page, but I think this paper is like the method I used.
I hope this will help you. Thanks
Thanks a lot~
During my training period, I still have the same question that
NotImplementedError: Input Error: Only 3D, 4D and 5D input Tensors supported (got 6D) for the modes: nearest | linear | bilinear | bicubic | trilinear (got trilinear)
However when I opened the pth file such as video0.pth
I found that the feature dimensions are 5D, so I guess is still the problem of my path file.
So could you do me a favor to help me to check my path file
dataset: MSR-VTT
dataset_dir: /media/zgx_docker_data/video_feature_extractor/data/MSRVTT/
feature_dir: ./TrainValFeature
hdf5_dir: ./TrainValVideohdf5
ann_file: /media/zgx_docker_data/VideoCaptioning/data/vocal1/train_val_videodatainfo.json
vocab_path: ./data/vocal1/vocab.pkl
the TrainValFeature is
TrainValFeature /───video0.pth
├─ video1.pth
├─ video2.pth
...
and the TrainValVideohdf5 is
TrainValVideohdf5 /───video0.hdf5
├─ video1.hdf5
├─ video2.hdf5
...
maybe I misunderstood the meaning of feature_dir
r50_k700_16f
, is that the path to save the feature?
By the way, I was wondering how could I type├─
and └─
easily which are copied from yours
I tried and was sure that the feature dir is TrainValFeature you can ignore maybe I misunderstood the meaning of feature_dir r50_k700_16f, is that the path to save the feature?
But why it was detected 6D?when I checked the video.pth it is exactly 5D?
>>> n = '../video_feature_extractor/data/MSRVTT/TrainValFeature/video1.pth'
>>> net = torch.load(n)
>>> print(net.shape)
torch.Size([1, 2048, 35, 7, 7])
>>> n = '../video_feature_extractor/data/MSRVTT/TrainValFeature/video2.pth'
>>> net = torch.load(n)
>>> print(net.shape)
torch.Size([1, 2048, 19, 7, 7])
>>> n = '../video_feature_extractor/data/MSRVTT/TrainValFeature/video3.pth'
>>> net = torch.load(n)
>>> print(net.shape)
torch.Size([1, 2048, 15, 7, 7])
hi, am facing similar issue ValueError: size shape must match input shape. Input is 4D, size is 3
hi, am facing similar issue ValueError: size shape must match input shape. Input is 4D, size is 3
what I had done is changing the file in dataset.py in line71, change ft.unsqueeze(0) to ft which the shape will be matched. the question in my code is that it expected input is [2048,m,7,7] however the features I got from the previous extracted code is [1,2048,m,7,7] I wish it will help and good luck.
thank u for the help, can you kindly suggest any enhancement or improve for generating a video captioning
Hello, yiskw713:
I am rebuilding your repo and during the rebuilding, I was confused about some concepts in
config.yaml
.First, is
dataset_dir
means the dir of features extracted by another repo you proposed which are thepth
files Second, isfeature_dir
means the parameters to initialize the neural network which means the model in your repo is retrained rather than initialize randomly? Third, in the previous issue, you mentioned that this repo is changed from an image caption code, do you have the paper about that method? Thanks.