It would be nice to average the features from multiple frames. In the most basic case, this would entail:
Storing the outputs from the last N model queries into a matrix (each row is a frame, and the columns are the class outputs)
1.1 It is assumed the model outputs the class probabilities.
1.2 This should be implemented as a "ring buffer" where we overwrite the oldest row.
Compute the average output by taking an average of each column
2.1 This should be an efficient operation for a DSP library
Use the average output to do taxonomic predictions.
3.1 The prediction code should be agnostic to whether it is receiving a direct output from a model or an averaged vector.
Some psuedo code for the different steps:
var buffer_pointers : [UnsafeMutablePointer<Double>] = []
let features = classifications[task].featureValue.multiArrayValue!
let num_features = features.count
let buffer_pointer = buffer_pointers[task]
let num_features = classifications[task].featureValue.multiArrayValue!.count
let buffer_pointer = buffer_pointers[task]
let stride : vDSP_Stride = num_features
let avg = UnsafeMutablePointer.allocate(capacity:num_features)
let length : vDSP_Length = UInt(buffer_size)
for i in 0..<num_features{
vDSP_meanvD(buffer_pointer.advanced(by: i), stride, avg.advanced(by: i), length)
}
3.
self.drawVisionRequestResults2(avg)
More sophisticated methods can be conceived where we keep a few different ring buffers, each buffer is responsible for a different time length: ~1 sec, ~3 sec, ~5 sec. And results are compared between each.
Perhaps an app should be able to request these buffers to be cleared. For example when a user does rapid movements (perhaps signifying they are done recognizing something or they are trying to recognize something different).
It would be nice to average the features from multiple frames. In the most basic case, this would entail:
3.1 The prediction code should be agnostic to whether it is receiving a direct output from a model or an averaged vector.
Some psuedo code for the different steps:
let features = classifications[task].featureValue.multiArrayValue! let num_features = features.count let buffer_pointer = buffer_pointers[task]
buffer_pointer.advanced(by: self.buffer_write_index * num_features).assign(from: UnsafePointer(OpaquePointer(features.dataPointer)), count: num_features)
let num_features = classifications[task].featureValue.multiArrayValue!.count let buffer_pointer = buffer_pointers[task]
let stride : vDSP_Stride = num_features let avg = UnsafeMutablePointer.allocate(capacity:num_features)
let length : vDSP_Length = UInt(buffer_size)
for i in 0..<num_features{ vDSP_meanvD(buffer_pointer.advanced(by: i), stride, avg.advanced(by: i), length) }
self.drawVisionRequestResults2(avg)