treasure-data / digdag

Workload Automation System
https://www.digdag.io/
Apache License 2.0
1.3k stars 221 forks source link

[Q] digdag server and scheduler #150

Closed hiroyuki-sato closed 8 years ago

hiroyuki-sato commented 8 years ago

Hello

We (Twitter) want to know about common use case server and scheduler.

Best regards.

frsyuki commented 8 years ago

Is it possible to execute server and scheduler on same environment?

What does "same environment" mean? It's possible to share a same database.

It it possible to persist execution result after stop scheduler?

You should be able to use --database DIR option to persist status on a disk. --task-log DIR option is also recommended.

At an early version, there was another functionality to store data as plain YAML files. It might be easier to understand than database. It was removed at cf655ce611a4fa7225cb17d3cd2d797119226ea2 and replaced by database (H2 in-memory, H2 on-disk, or PostgreSQL).

The idea was:

How do you suggest about further improvements?

hiroyuki-sato commented 8 years ago

"same environment" is mean that Use digdag server and scheduler at the same time.

If I push the following sample project to the server, It executes echo date and echo 'hello' at every minute. So It seems that Digdag server contains the scheduler.

That's why I'm confusing how to use scheduler and server at same time.

And I'll answer your second question later. Because I have no idea yet.

Procedure

digdag server -m

sample.dig

timezone: "Asia/Tokyo"

schedule:
  minutes_interval>: 1

+current_date:
  sh>: echo `date`

+echo_hello:
  sh>: echo "hello"
digdag push sample

Output

2016-06-30 19:14:00 +0900 [INFO] (scheduler-0): Starting a new session project id=1 workflow name=hoge2 session_time=2016-06-30T19:14:00+09:00
2016-06-30 19:14:01 +0900 [INFO] (0036@+hoge2+current_date): sh>: echo `date`
2016年 6月30日 木曜日 19時14分01秒 JST
2016-06-30 19:14:01 +0900 [INFO] (0036@+hoge2+echo_hello): sh>: echo "hello"
hello
hiroyuki-sato commented 8 years ago

And one more thing.

You should be able to use --database DIR option to persist status on a disk. --task-log DIR option is also recommended.

Is this second digdag scheduler command proper behavior?

digdag scheduler -o db --task-log log
2016-06-30 19:49:06 +0900: Digdag v0.8.3
error: java.lang.RuntimeException: io.digdag.core.repository.ResourceConflictException: Resource already exists: revision=1 in project id=1

Procedure

timezone: "Asia/Tokyo"

schedule:
  minutes_interval>: 1

+current_date:
  sh>: echo `date`

+echo_hello:
  sh>: echo "hello"

First execution.

digdag scheduler -o db --task-log log
2016-06-30 19:53:47 +0900: Digdag v0.8.3
2016-06-30 19:53:49 +0900 [INFO] (main): Added new revision 1
2016-06-30 19:53:49 +0900 [INFO] (main): Starting server on 127.0.0.1:65432
2016-06-30 19:53:49 +0900 [INFO] (main): XNIO version 3.3.3.Final
2016-06-30 19:53:49 +0900 [INFO] (main): XNIO NIO Implementation Version 3.3.3.Final
2016-06-30 19:54:00 +0900 [INFO] (scheduler-0): Starting a new session project id=1 workflow name=hoge2 session_time=2016-06-30T19:54:00+09:00
2016-06-30 19:54:00 +0900 [INFO] (0029@+hoge2+current_date): sh>: echo `date`
Thu Jun 30 19:54:00 JST 2016
2016-06-30 19:54:01 +0900 [INFO] (0029@+hoge2+echo_hello): sh>: echo "hello"
hello

stop scheduler

Second execution

digdag scheduler -o db --task-log log
2016-06-30 19:54:49 +0900: Digdag v0.8.3
error: java.lang.RuntimeException: io.digdag.core.repository.ResourceConflictException: Resource already exists: revision=1 in project id=1
hiroyuki-sato commented 8 years ago

I asked developer this question at TDTech in Japan.

As my understanding, the answer is the following.

item local mode local mode with scheduler server mode
work flow storage file RDBMS*1 RBMS
server & scheduler command N/A digdag schduler digdag server
workflow registration N/A autoload push
scheduling N/A Possible Possible
separate agent N/A Possible? Possible
RDBMA Storage N/A H2,PostgreSQL H2,PostgreSQL
Execution result storage .digdag directory RDBMS,S3 RDBMS,S3
Log output plugin N/A Possible? Possible*2
Scaling N/A N/A Possible
Purpose Develop,Small env without scheduler Small Env large scale
Web UI N/A Possible? Development