medic / cht-app-monitoring-data-ingest

GNU General Public License v3.0
1 stars 0 forks source link

Delete scrapped monitoring docs to prevent data bloat #77

Closed eljhkrr closed 1 year ago

eljhkrr commented 1 year ago

Proposed period for maintaining records is 6 months as we're unlikely to act on older data

1yuv commented 1 year ago

If we delete the data, it means we can't reuse the informaiton already available for other purposes. If you want to ensure that reporting is not impacted because of the huge dataset, we can drop the old data in historical tables and keep reporting data for 6 months only. Here's how we are doing that for impact monitoring.

This way, we can keep the scraped data for longer term, yet keep current reporting data small and fast.

eljhkrr commented 1 year ago

Thanks @yrimal, added as agenda item for upcoming meeting

derickl commented 1 year ago

If the target table is well formed, you wouldn't have an issue with a proper indexing strategy

kennsippell commented 1 year ago

This is how many docs we have. We have been monitoring for about a year. It is about 50MB of data. Should we just ignore this issue for now?

doctype count
klipfolio_datasource 8,944
monitoring 12,127
error 113
couch2pg ~5k
1yuv commented 1 year ago

Should we just ignore this issue for now?

I agree, this sounds like a nominal data.