Closed ProPythoner67 closed 3 years ago
Hi, the input of "model.predict() " must be a batch of data. Therefore, you have to reshape your single video clip to the size of [1, 64, 224, 224, 5]. "1" means that current batch size is 1, 64 is the number of frames for this pre-trained model, 224 is the size of single frame, and 5 means 3 rgb channels + 2 optical flows.
Thanks for the fast response, I've tried to reshape my single video clip but after all my attempts I didn't managed to get the right shape. Do you have a code example for the full preprocess that I should run for single video clip in order to predict with the pre-trained model?
Sorry to say that I also lost the codes used before. You mentioned that you utilized the function(Video2Npy) to preprocess a video, while the length of the returned array will be equal to the original length of the input video, not fixed 64 frames. Our pre-trained model only receives a fixed length (64). So, you must make your input in the shape of [1, 64, 224, 224, 5].
Also, you could try to put the data in our data generator and then feed it to the network. (ref to here)
So if i understood right, the pre-trained model gives one prediction per 64 frames?
Correctly, for longer video (>64 frames), you can sample 64 frames sparsely or utilize a sliding window algorithm to process each sliced cliip.
Ok great, I've grouped my frames into 64 frame groups and now i'm getting this prediction result for example: [[9.9999881e-01 1.2446051e-06]] What does it represents? does the first value represent the probability for violent or for non-violent?(Im using the pre-trained model)
Hi, you could check the printed information after initializing the data generator. If "violent" is assigned to class 0, then pred[0] means "violent". Otherwise, pred[0] means "non-violent".
I've succeeded to use the DataGenerator but it seems like this class returns (from getitem) 1 batch of 64 frames for the whole video, so I do get the predict but its only for 64 frames, how can i get the other 64 frames groups? dosen't the DataGenerator class should return them also? or maybe i'm not using the class correctly?
The data generator will sparsely sample 64 frames from an input video (no matter how long it is), so in real -case inference you need to implement a sliding window algorithm by yourself.
Hey, First of all thanks for sharing this repo! I am trying to predict video using the pre-trained model. Here is what I've done:
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model = load_model(model_path)
model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=['accuracy'])
frames = Video2Npy(file_path='some_file_path.mp4')
for frame in frames:
preds = model.predict(np.expand_dims(frame, axis=0))
And from the prediction line I got this error:
ValueError: Input 0 is incompatible with layer model_1: expected shape=(None, 64, 224, 224, 5), found shape=(None, 224, 224, 5)
It's looking like the preprocess for the frames doesn't give the expected shape, what am I doing wrong? Also I've search for example for how to predict video using the pre-trained model in this repository, and I couldn't find.
Thanks!