Implement CNN detection aggregation "Best per site" and "Best per site, per day"

veckatimest commented 5 months ago

Is your feature request related to a problem? Please describe. Just like we can find best detections per site or per site+date in pattern matching, we would like to have them in CNN job results.

Describe the solution you'd like Update existing endpoint /detections so that it has a new query option aggregate, that can take parameters in the following format:

stream = 'best per stream' (just like Best per Site)
stream,1d = best per site per day (just like Best per Site, per Day)
steam,7d = per site per week (we don't have it yet, but we could)

This endpoiint will still return the same data format as it did before(a flat array of detections).

New parameter in documentation: Response:

Caveats Response is not grouped into "sites" so the frontend will need to group result visually. We have stream_id in the response, but it's not human-readable, the frontend will need to map these steam_ids to location names.

Describe alternatives you've considered To create a separate endpoint (it feels like the endpoints are not so different to need an additional one) To update existing /clustered-detections endpoint (does it work actually??)

grindarius commented 5 months ago

Hello @veckatimest I have read your proposal. Looking at the UI it looks like the 2 new Grouping by site still takes it almost all the same parameters. So I think making the new route is a bit redundant.

I agree with you on the new aggregate query parameter that will take the aggregate by, then the interval. Please don't forget the "count" of the "best x per aggregate method". either another query parameter or making the aggregate like aggregate=stream,1d,5 this is up to you on how to design. Below is the UI where the "best x per aggregate" will be configured.

Now on the frontend. For the best per site. We can run a groupBy using streamId to show the data. for the best per site per day we can also run a groupBy but with the key being dayjs(response.start).format('YYYY-MM-DD') instead. What are your thoughts @naluinui ? Thank you @veckatimest for the insights.

antonyharfield commented 5 months ago

For best x per site per day, you are going to need to think about performance when there are 1 million results in a job. I was expecting that you would need to precompute the top 10 per site per day and then only allow the client to request top as 1 to 10. You are never going to request more than the top 10, it's more likely to be 1-5.

rfcx / rfcx-api

Implement CNN detection aggregation "Best per site" and "Best per site, per day" #582