Open chenbohua3 opened 5 years ago
I have no idea about unsupervised part because even the supervised part I cannot get the performance the paper mentioned.
thans for your reply, I will dive into this and report my progress in time:)
did you check the mismatch of video length and the label length? For example, Bus_in_Rock_Tunnel.mp4 has 5133 frames but the corresponding label only has 5131 values.
No, I have not done that before.
But the code can run... weird
maybe you could check the order of video file in video_list = list(video_dir.glob('*.mp4')).sort()
this may be not consistent with that in h5 file.
I have updated this repo by changing the evaluation method.
I have reviewed the code, but I found there may be something wrong. The validation set in split 2 may come up in the training set of split 1, which may lead to evaluation on training set.
Supposed there are 25 videos in dataset(1.mp4, 2.mp4 ...... 25.mp4), I split 80% for training(20 videos) and 20% for testing(5 videos).
I got your point so I need to reset the model in every before processing each split.
I have updated this repo by resetting the model in every before processing each split.
yes:)
Do you have any idea of the unsupervised part: “We first select Y frames (i.e. keyframes) based on the prediction scores from the decoder.”
The decoder will give [2,320] output and we want to get 0/1 key frame mask from decoder output.
This is where I am confused. I think this step is to choose frames which have top-Y scores to form output in shape [2, Y]. But the output is too sparse to be used for reconstructing the input features. So I have no ideas about this part, further experiments need to be done to serch the structure.
How to write the code "The decoder will give [2,320] output and we want to get 0/1 key frame mask from decoder output."
import torch h = torch.randn(1, 2, 320, requires_grad=True); print(h) val,idx = h.max(0, keepdim=True)
but idx is not differentiable...
yes.. I haven’t figured it so far.
I have found gumbel_softmax BUT...
import torch h = torch.randn(1, 2, 5, requires_grad=True); print(h) val,idx = h.max(1, keepdim=True) print(idx) z = torch.nn.functional.gumbel_softmax(h, tau=2, hard=False, dim=1); print(z) z = torch.nn.functional.gumbel_softmax(h, tau=2, hard=True, dim=1); print(z)
outputs are:
tensor([[[-0.0259, -0.9393, -0.2825, 0.6466, -1.0658],[-0.6078, 1.2127, 0.1509, -0.9749, -1.4952]]], requires_grad=True) tensor([[[0, 1, 1, 0, 0]]]) tensor([[[0.7145, 0.1478, 0.8365, 0.5046, 0.4407],[0.2855, 0.8522, 0.1635, 0.4954, 0.5593]]], grad_fn=
) tensor([[[1., 0., 0., 1., 0.],[0., 1., 1., 0., 1.]]], grad_fn= )
the softmax procedure looks weird....
Yes... I haven't use this function before. By the way, do you have online chatting tools like line? We may chat with that for convenience.
Yes, I have. Please send me your LINE ID to ZX78986@gmail.com
sorry, line may be forbidden in my country, did you have other tools like wechat?
OK, send me your ID to the email above.
yes I have sent my wechat id to your email:)
I have added you.
Would you please send me a sticker or something. I cannot find you.
I have sent a QR code of my wechat account to your email, plz check whether it can work:)
After reading the unsupervised part of the paper, I can not figure out what exactly the structures of unsupervised part is.
By the way, thanks very much for your replementation of FCSN ans VSULD, I have folked them and will update my progress in time:)