riparias / gbif-alert

GBIF Alert is a GBIF occurrence based alert system.
https://gbif-alert-demo.thebinaryforest.net/
MIT License
7 stars 2 forks source link

Seen / unseen observations: performance improvements #273

Open niconoe opened 7 months ago

niconoe commented 7 months ago

As noted by @damianooldoni and others, some parts of the application (where seen/unseen observations must be counted, for example the "my alerts" page), are getting slower and slower over time.

A big part of the issue is that we add a new entry in the database each time a user mark an observation as seen. So 500.000 observation * 70 users (roughly what we have now on the RIPARIAS early alert website) leads to a table with 35.000.000 rows, which has a performance impact (also for the nightly import, the "mark all has seen" feature, ...). The implementation is similar to what's described at https://dba.stackexchange.com/questions/52355/database-design-for-holding-read-unread-content-state

I think it would be better to reverse the implementation and only create entries for unseen observations.

Pros:

Cons:

Some smaller mitigation approaches can be tried in the meantime, such as:

damianooldoni commented 3 months ago

Hi 👋 Some users are complaining about the performance of the LIFE RIPARIAS instance of gbif-alert (alert.riparias.be). They need to use our tool with externals and I can understand it's not easy to show a tool which takes tens of seconds to show the selected locations and the observations falling in it.

@niconoe: no idea how much you can work on this in the next days/weeks/months... And do you think that reducing the number of observations would help on the short term? Now we load data of the last 10 years. But again, an alert system doesn't need such historical data. If we set year >= 2020 in the data download query, we will move from 625k occurrences to 370k. What do you think about it?

niconoe commented 2 months ago

@damianooldoni: sorry I only notice your message now! Indeed, the performance is getting slower for the reason stated above, in the initial ticket description.

I'd like to solve that cleanly, but it's not a small task and I'm not sure if I could tackle it before July.

In the meantime, having less observations in the system as you suggest should do the trick!