Closed Usernamezhx closed 4 years ago
I did not get the question. Can you explain?
such as:
for image_segments_cv2 in videos:
if len(image_segments_cv2) >= 10:
image_segments_cv2 = image_segments_cv2[1:9]
image_segments = [Image.fromarray(img) for img in image_segments_cv2]
image_segments = transform(image_segments)
process_data_final = [image_segments,image_segments] # <--------------here. double input
process_data_final = torch.stack(process_data_final, 0)
input_var = process_data_final.view(-1, 3, process_data_final.size(2), process_data_final.size(3))
rst = net(input_var)
I am sorry, where is this code snippet from?
so sorry reply late. I just want to make sure. the snippet code reference from here
This is equivalent of extracting two different sequences of frames from a video. The network will predict the action category from these two sequences separately and average the scores to obtain the final prediction. Since the two sequences contain different information (frames), the chances of predicting the correct category increases.
thanks for your reply. I get it.
thanks for your work. I want know if there is the theoretical basis about the double input can improve the acc.