runabol / tork

A distributed workflow engine
https://tork.run
MIT License
599 stars 40 forks source link

request: disable logging #349

Open ppcololo opened 8 months ago

ppcololo commented 8 months ago

For now as I can understand we have such flow:

  1. workers publish logs in rabbit in queue logs
  2. coordinator pull logs from this queue logs
  3. coordinator saves logs into postgres DB
  4. after I press button Logs I can see them (logs from DB)

This makes some problems - if we have a lot of workers and logs I can see more than x millions mesages in the queue logs. As I can see coordinator doesn't pull and save logs in DB in proper time and that means if I press button Logs it shows nothing. And from time to time I have to purge this queue to see logs image

Possible options:

  1. disable logging (use alternatives like ELK stack)
  2. adjust coordinator\logging performance
runabol commented 8 months ago

Did you try setting TORK_COORDINATOR_QUEUES_LOGS to a value greater than 1 (default) to have multiple subscribers processing the logs queue?

ppcololo commented 8 months ago

Thanks for the pointing to this. As I can see - https://github.com/runabol/tork/blob/main/configs/sample.config.toml#L39-L45 Here some values. But could you share the link to documentation which describes in more details what that values mean? If I set logs=x what does it mean?

runabol commented 8 months ago

It means number of subscribers/goroutines that will process the logs queue in parallel.

ppcololo commented 8 months ago

Thanks @runabol It helped a lot. Please add more info to the documentation - you will avoid a lof of questions in the future

runabol commented 8 months ago

That's fair

ppcololo commented 6 months ago

I have to reopen this: We really need option to disable logging. Take a look - we set option in the logs:

[datastore.postgres]
dsn = ""
task.logs.interval = "168h"

It means - logs retention 7d Today is 03.05.2024 But in DB I can see:

tork=# select min(created_at) from tasks_log_parts limit 1;
            min
----------------------------
 2024-04-22 12:38:07.650203
(1 row)

And our DB grows indefenetly:

tork=# select
  table_name,
  pg_size_pretty(pg_total_relation_size(quote_ident(table_name))),
  pg_total_relation_size(quote_ident(table_name))
from information_schema.tables
where table_schema = 'public'
order by 3 desc;
   table_name    | pg_size_pretty | pg_total_relation_size
-----------------+----------------+------------------------
 tasks_log_parts | 137 GB         |           146990120960
 tasks           | 221 MB         |              231546880
 jobs            | 5168 kB        |                5292032
 nodes           | 1744 kB        |                1785856
(4 rows)

So I can say - option in config doesnt work OR work really slow and can't delete all new logs. We want to disable logging completely and use another software for this like ELK stack

runabol commented 6 months ago

Try this config option: https://github.com/runabol/tork/blob/4319d73113230a79fb110c49abf1c87ff0324b17/engine/datastore.go#L35

ppcololo commented 6 months ago

If you check my message above - this option doesn't work Today I've checked logs in DB and I see

tork=# select min(created_at) from tasks_log_parts;
            min
----------------------------
 2024-04-22 12:42:13.105112
(1 row)

and

tork=# select
  table_name,
  pg_size_pretty(pg_total_relation_size(quote_ident(table_name))),
  pg_total_relation_size(quote_ident(table_name))
from information_schema.tables
where table_schema = 'public'
order by 3 desc;
   table_name    | pg_size_pretty | pg_total_relation_size
-----------------+----------------+------------------------
 tasks_log_parts | 162 GB         |           173817077760
 tasks           | 221 MB         |              231948288
 jobs            | 5312 kB        |                5439488
 nodes           | 1832 kB        |                1875968
(4 rows)

+ 25GB of logs from yesterday As you can see tork deleted 6 mins of logs was: 2024-04-22 12:38:07.650203 now: 2024-04-22 12:42:13.105112

runabol commented 6 months ago

Sounds like the pruning process is not catching up quickly enough with the amount of logs you're generating per day. I can make the number of records it deletes per cleaning period configurable. Right now it's hard-coded to 1000 I believe.

ppcololo commented 6 months ago

If I'm not mistaken - we have about 20 millions of rows per day in DB

runabol commented 6 months ago

Can you try release 0.1.73? It adds improvements to log shipping -- buffering log messages (up to one second) rather than sending each log line separately.