[Q] digdag server and scheduler

hiroyuki-sato commented 8 years ago

Hello

We (Twitter) want to know about common use case server and scheduler.

Is it possible to execute server and scheduler on same environment?
If not, When should I use scheduler?
It it possible to persist execution result after stop scheduler?
- I would like to check execution result after stop scheduler.
- digdag scheduler run on memory mode. I can't access execution result(ex. digdag sessions) after stop scheduler.

Best regards.

frsyuki commented 8 years ago

Is it possible to execute server and scheduler on same environment?

What does "same environment" mean? It's possible to share a same database.

It it possible to persist execution result after stop scheduler?

You should be able to use --database DIR option to persist status on a disk. --task-log DIR option is also recommended.

At an early version, there was another functionality to store data as plain YAML files. It might be easier to understand than database. It was removed at cf655ce611a4fa7225cb17d3cd2d797119226ea2 and replaced by database (H2 in-memory, H2 on-disk, or PostgreSQL).

The idea was:

When a session starts, scheduler stores parameter file on local disk.
Users can use REST API to retry the session, or
Users can use the parameter file to retry the session manually using command line.

How do you suggest about further improvements?

hiroyuki-sato commented 8 years ago

"same environment" is mean that Use digdag server and scheduler at the same time.

If I push the following sample project to the server, It executes echo date and echo 'hello' at every minute. So It seems that Digdag server contains the scheduler.

That's why I'm confusing how to use scheduler and server at same time.

When/Why I execute scheduler independently?
If I execute Scheduler, It must connect the same database?
What's you shouldn't use PostgreSQL with scheduler mode mean? Use H2 in this case?

And I'll answer your second question later. Because I have no idea yet.

Procedure

digdag server -m

sample.dig

timezone: "Asia/Tokyo"

schedule:
  minutes_interval>: 1

+current_date:
  sh>: echo `date`

+echo_hello:
  sh>: echo "hello"

digdag push sample

Output

2016-06-30 19:14:00 +0900 [INFO] (scheduler-0): Starting a new session project id=1 workflow name=hoge2 session_time=2016-06-30T19:14:00+09:00
2016-06-30 19:14:01 +0900 [INFO] (0036@+hoge2+current_date): sh>: echo `date`
2016年 6月30日 木曜日 19時14分01秒 JST
2016-06-30 19:14:01 +0900 [INFO] (0036@+hoge2+echo_hello): sh>: echo "hello"
hello

hiroyuki-sato commented 8 years ago

And one more thing.

You should be able to use --database DIR option to persist status on a disk. --task-log DIR option is also recommended.

Is this second digdag scheduler command proper behavior?

digdag scheduler -o db --task-log log
2016-06-30 19:49:06 +0900: Digdag v0.8.3
error: java.lang.RuntimeException: io.digdag.core.repository.ResourceConflictException: Resource already exists: revision=1 in project id=1

Procedure

timezone: "Asia/Tokyo"

schedule:
  minutes_interval>: 1

+current_date:
  sh>: echo `date`

+echo_hello:
  sh>: echo "hello"

First execution.

digdag scheduler -o db --task-log log
2016-06-30 19:53:47 +0900: Digdag v0.8.3
2016-06-30 19:53:49 +0900 [INFO] (main): Added new revision 1
2016-06-30 19:53:49 +0900 [INFO] (main): Starting server on 127.0.0.1:65432
2016-06-30 19:53:49 +0900 [INFO] (main): XNIO version 3.3.3.Final
2016-06-30 19:53:49 +0900 [INFO] (main): XNIO NIO Implementation Version 3.3.3.Final
2016-06-30 19:54:00 +0900 [INFO] (scheduler-0): Starting a new session project id=1 workflow name=hoge2 session_time=2016-06-30T19:54:00+09:00
2016-06-30 19:54:00 +0900 [INFO] (0029@+hoge2+current_date): sh>: echo `date`
Thu Jun 30 19:54:00 JST 2016
2016-06-30 19:54:01 +0900 [INFO] (0029@+hoge2+echo_hello): sh>: echo "hello"
hello

stop scheduler

Second execution

digdag scheduler -o db --task-log log
2016-06-30 19:54:49 +0900: Digdag v0.8.3
error: java.lang.RuntimeException: io.digdag.core.repository.ResourceConflictException: Resource already exists: revision=1 in project id=1

hiroyuki-sato commented 8 years ago

I asked developer this question at TDTech in Japan.

As my understanding, the answer is the following.

item	local mode	local mode with scheduler	server mode
work flow storage	file	RDBMS*1	RBMS
server & scheduler command	N/A	digdag schduler	digdag server
workflow registration	N/A	autoload	push
scheduling	N/A	Possible	Possible
separate agent	N/A	Possible?	Possible
RDBMA Storage	N/A	H2,PostgreSQL	H2,PostgreSQL
Execution result storage	.digdag directory	RDBMS,S3	RDBMS,S3
Log output plugin	N/A	Possible?	Possible*2
Scaling	N/A	N/A	Possible
Purpose	Develop,Small env without scheduler	Small Env	large scale
Web UI	N/A	Possible?	Development

*1 Buggy https://github.com/treasure-data/digdag/issues/132
*2 Stilll Development

treasure-data / digdag