inaturalist / SeekReactNative

Seek v2, built with React Native for Android and iOS
https://www.inaturalist.org/pages/seek_app
MIT License
149 stars 26 forks source link

Average Multiple Frames #615

Open gvanhorn38 opened 5 years ago

gvanhorn38 commented 5 years ago

It would be nice to average the features from multiple frames. In the most basic case, this would entail:

  1. Storing the outputs from the last N model queries into a matrix (each row is a frame, and the columns are the class outputs) 1.1 It is assumed the model outputs the class probabilities. 1.2 This should be implemented as a "ring buffer" where we overwrite the oldest row.
  2. Compute the average output by taking an average of each column 2.1 This should be an efficient operation for a DSP library
  3. Use the average output to do taxonomic predictions.
    3.1 The prediction code should be agnostic to whether it is receiving a direct output from a model or an averaged vector.

Some psuedo code for the different steps:

  1. 
    var buffer_pointers : [UnsafeMutablePointer<Double>] = []

let features = classifications[task].featureValue.multiArrayValue! let num_features = features.count let buffer_pointer = buffer_pointers[task]

buffer_pointer.advanced(by: self.buffer_write_index * num_features).assign(from: UnsafePointer(OpaquePointer(features.dataPointer)), count: num_features)


2. 

let num_features = classifications[task].featureValue.multiArrayValue!.count let buffer_pointer = buffer_pointers[task]

let stride : vDSP_Stride = num_features let avg = UnsafeMutablePointer.allocate(capacity:num_features) let length : vDSP_Length = UInt(buffer_size)

for i in 0..<num_features{ vDSP_meanvD(buffer_pointer.advanced(by: i), stride, avg.advanced(by: i), length) }


3. 

self.drawVisionRequestResults2(avg)



More sophisticated methods can be conceived where we keep a few different ring buffers, each buffer is responsible for a different time length: ~1 sec, ~3 sec, ~5 sec. And results are compared between each. 

Perhaps an app should be able to request these buffers to be cleared. For example when a user does rapid movements (perhaps signifying they are done recognizing something or they are trying to recognize something different). 
jtklein commented 1 year ago

This is about ARCamera frames