treasure-data / digdag

Workload Automation System
https://www.digdag.io/
Apache License 2.0
1.3k stars 221 forks source link

Deleting sessions command / operator. #809

Open hiroyuki-sato opened 6 years ago

hiroyuki-sato commented 6 years ago

Digdag server uses large database area when running the server for a long time. In this situation, Digdag user wants to delete past sessions. But currently, those feature doesn't implement yet.

An Idea

Ref: Twitter

muga commented 6 years ago

This is good because similar system commands or internal workflow might be required for this feature https://github.com/treasure-data/digdag/pull/845. To mitigate this issue, we already have option that we could use Postgresql as Digdag backend instead of h2db. What do you think?

hiroyuki-sato commented 6 years ago

@muga Thank you for replying this issue.

I think It is a good mitigation solution to use PostgreSQL instead of H2. Until introduce a new command, a document about deleting session statement may help for PostgreSQL users. (I think nobody knows how to delete Digdag sessions in PostgreSQL)

muga commented 6 years ago

@hiroyuki-sato Thank you for suggesting. Got it. sounds reasonable. To make such document, we'd better to investigate 1) what tables we want to reduce 2) what order to reduce size of tables. In my case, sessions, session_attempts, task_archives tables are top 3.

hiroyuki-sato commented 6 years ago

@muga I asked digdag users about database usage in Japanese. Twitter

kamikaseda commented 5 years ago

According to the record I can confirm, the task_archives table is 8 to 9 times larger than the other tables, the next large table is session_attempts (2 times larger than other tables).   So, it's enough to make a delete statement for task_archives and session_attempts table.

cosmok commented 5 years ago

I ended scheduling this sql to delete records older than 1 mont:

DELETE FROM task_archives WHERE id IN (SELECT id FROM session_attempts WHERE created_at < (NOW() - INTERVAL '1 MONTH'));
DELETE FROM session_attempts WHERE created_at < (NOW() - INTERVAL '1 MONTH');
DELETE FROM sessions WHERE last_attempt_created_at < (NOW() - INTERVAL '1 MONTH');
ehaupt commented 4 years ago

Hello @muga,

may I suggest a digdag command-line option where you can specify a max age for cleaning up old sessions?

For instance:

# delete sessions older than 30 days
$ digdag sessions --cleanup 30d

This way the cleanup would work regardless of the DB backend and would be suitable for cronjobs.