rfcx / rfcx-api

Core, Media/Assets and MQTT APIs
https://api.rfcx.org/docs/
Apache License 2.0
0 stars 0 forks source link

Implement CNN detection aggregation "Best per site" and "Best per site, per day" #582

Closed veckatimest closed 5 months ago

veckatimest commented 5 months ago

Is your feature request related to a problem? Please describe. Just like we can find best detections per site or per site+date in pattern matching, we would like to have them in CNN job results. image

Describe the solution you'd like Update existing endpoint /detections so that it has a new query option aggregate, that can take parameters in the following format:

This endpoiint will still return the same data format as it did before(a flat array of detections).

New parameter in documentation: image Response: image

Caveats Response is not grouped into "sites" so the frontend will need to group result visually. image We have stream_id in the response, but it's not human-readable, the frontend will need to map these steam_ids to location names.

Describe alternatives you've considered To create a separate endpoint (it feels like the endpoints are not so different to need an additional one) To update existing /clustered-detections endpoint (does it work actually??)

grindarius commented 5 months ago

Hello @veckatimest I have read your proposal. Looking at the UI it looks like the 2 new Grouping by site still takes it almost all the same parameters. So I think making the new route is a bit redundant.

I agree with you on the new aggregate query parameter that will take the aggregate by, then the interval. Please don't forget the "count" of the "best x per aggregate method". either another query parameter or making the aggregate like aggregate=stream,1d,5 this is up to you on how to design. Below is the UI where the "best x per aggregate" will be configured.

Screenshot 2567-03-29 at 14 12 18

Now on the frontend. For the best per site. We can run a groupBy using streamId to show the data. for the best per site per day we can also run a groupBy but with the key being dayjs(response.start).format('YYYY-MM-DD') instead. What are your thoughts @naluinui ? Thank you @veckatimest for the insights.

antonyharfield commented 5 months ago

For best x per site per day, you are going to need to think about performance when there are 1 million results in a job. I was expecting that you would need to precompute the top 10 per site per day and then only allow the client to request top as 1 to 10. You are never going to request more than the top 10, it's more likely to be 1-5.