treasure-data / digdag

Workload Automation System
https://www.digdag.io/
Apache License 2.0
1.3k stars 221 forks source link

state_flags in session_attempts table #603

Closed cosmok closed 7 years ago

cosmok commented 7 years ago

Hi,

The state_flags has integer codes, what do the state_flags mean?

For instance, the query below returns the code and I would like to like collect metrics for a project/ workflow. labels for the codes will be handy for graphing the metrics.

SELECT DISTINCT(state_flags) FROM session_attempts;
+---------------+
|   state_flags |
|---------------|
|             6 |
|             2 |
|             3 |
+---------------+
hiroyuki-sato commented 7 years ago

@cosmok

NAME CODE
CANCEL_REQUESTED_CODE 1
DONE_CODE 2
SUCCESS_CODE 4

6 = DONE_CODE + SUCCESS_CODE

https://github.com/treasure-data/digdag/blob/master/digdag-core/src/main/java/io/digdag/core/session/AttemptStateFlags.java

    public static final int CANCEL_REQUESTED_CODE = 1;
    public static final int DONE_CODE = 2;
    public static final int SUCCESS_CODE = 4;

https://github.com/treasure-data/digdag/blob/master/digdag-core/src/main/java/io/digdag/core/database/migrate/Migration_20151204221156_CreateTables.java#L115

        handle.update(
                context.newCreateTableBuilder("session_attempts")
                .addLongId("id")
                .addLong("session_id", "not null references sessions (id)")
                .addInt("site_id", "not null")  // denormalized for performance
                .addInt("project_id", "not null references projects (id)")  // denormalized for performance
                .addString("attempt_name", "not null")
                .addLong("workflow_definition_id", "references workflow_definitions (id)")
                .addShort("state_flags", "not null")  // 0=running or blocked, 1=cancel_requested, 2=done, 4=success
                .addString("timezone", "not null")
                .addMediumText("params", "")
                .addTimestamp("created_at", "not null")
                .build());
cosmok commented 7 years ago

Thanks @hiroyuki-sato . What does 3 stand for? Also what does 6 (DONE + SUCCESS) mean? How is it different to just SUCESS?

hiroyuki-sato commented 7 years ago

@cosmok Why do you want to know that status?

It's internal implementation, and It may change in future. It's better to fetch attempt status with cli.

Anyway, have you ever seen value 4 (only SUCCESS_CODE flag)? Your SQL result output 2,3,6 only.

I'm not a core developer, so I have to read source code before answer. I suppose status 4 doesn't use and status 3 is not succeed = fail?.

cosmok commented 7 years ago

@hiroyuki-sato status_flag=3 for a failed one. Also status flag=0 for pending/running.

I am trying to collect metrics on the different states for workflows. So, it can be used for alerting as well as for monitoring. I am not aware of any API that I can use to effectively data that looks like the output of this SQL:

SELECT s.workflow_name, st.state_flags, COUNT(st.id) AS count FROM sessions s JOIN session_attempts st ON (st.session_id = s.id)  GROUP BY 1,2;
hiroyuki-sato commented 7 years ago

@cosmok I ran the following workflow, and the status value was 2. The previous post was wrong. status 3 = CANCEL_REQUESTED_CODE + DONE_CODE So this code meaning is "Canceled"(Maybe). Have you ever tried that?

I'm not sure whether you can get those metrics with API call. I'll let you know if I found that way.

timezone: UTC

+setup:
  sh>: echo start ${session_time}

+false:
  sh>: /usr/bin/false

FLAG

NAME CODE
CANCEL_REQUESTED_CODE 1
DONE_CODE 2
SUCCESS_CODE 4

VALUE MEANING

VALUE MEANING
0 PENDING/RUNNING
1 NOT USED?
2 FAILED
3 KILLED(digdag kill xxx)
4 NOT USED?
5 NOT USED?
6 COMPLETE(SUCCEED)

UPDATE I tried to execute digdag kill and the result value is 3.

cosmok commented 7 years ago

@hiroyuki-sato yes state_flags: 3 appears to mean killed (i did kill the few that now have 3 as state_flags)

Thanks!

hiroyuki-sato commented 7 years ago

@cosmok Thank you for your reporting!. Please close this issue when you solve your question.

I think it is better to create another issue for metrics feature request.

cosmok commented 7 years ago

Thanks @hiroyuki-sato