Closed gordan-bobic closed 1 year ago
Now that you memtioned the QUERY_RETENTION env (it's QUERIES_RETENTION actually), I found that the auto-remove qan data feature is actually not working, do we still need this feature?
Yes, pretty sure we need that feature to work.
Or maybe there is a way to "cheat", by deleting query_classes where last_seen is older than the data retention period (QUERY_RETENTION environment variable, if I remember correctly), plus a day just to make sure.
OK, I think I'm gonna add this to the "auto-purge qan data feature", and for node deletion edge case, we just leave it to the auto-purge process, so we don't need 2 separate processes to do this, although the query_classes table data will not be purged immediately after the node deletion in this case, is that OK?
Sounds good. It doesn't have to be deleted immediately, regular scheduled daily purge cycle is absolutely fine.
It looks like query_classes table keeps query classes forever - even after no corresponding entry in query_examples table remains. Under those conditions, as far as I can tell, queries that have no examples will never actually show up in QAN, so there could be millions of query_class rows that serve no purpose.
Objective: 1) Verify that we don't use this orphaned query_class data anywhere 2) Add this to cleanup process so that the data is purged.
Watch out for the edge case of purges through node deletion. It looks like instance_id comes from query_examples. If we delete a node, we probably delete query_examples based on this, but it looks like we don't also clean up the orphaned queries from query classes (because the same query class could come from multiple nodes). This means that query_classes cleanup will have to be based on something like:
Get the list of qcic, then purge them by primary key.
Or maybe there is a way to "cheat", by deleting query_classes where
last_seen
is older than the data retention period (QUERY_RETENTION environment variable, if I remember correctly), plus a day just to make sure.Either way, this should be added to the data pruning process, I don't think there is any usefulness to this stale query_classes data because I don't see how it could show up in QAN, and I cannot think of any other use for it.