Rn we just process one image at a time sequentially, but I was thinking it might benefit us to make our architecture asynchronous. Here's my initial idea:
images come in at whatever FPS the camera can run at
we have some algorithm that decides which image tiles we should actually run object detection on. With our current speed I imagine most tiles will be discarded. Maybe it can take into account where we expect targets to be when prioritizing what tiles to process, but this would totally spaghettify our software architecture by coupling front-end detection to back-end geolocation and object tracking.
after object detection, all bounding box crops get fed into classification, or if there's too many maybe we again use tracking info to prioritize what to classify
Rn we just process one image at a time sequentially, but I was thinking it might benefit us to make our architecture asynchronous. Here's my initial idea: