riga / law

Build large-scale task workflows: luigi + job submission + remote targets + environment sandboxing using Docker/Singularity
http://law.readthedocs.io
BSD 3-Clause "New" or "Revised" License
96 stars 39 forks source link

CLI command for default config #164

Closed StFroese closed 11 months ago

StFroese commented 11 months ago

Description

Would be nice to have a CLI command which creates a default config (or a basic structure) in the current directory, e.g. law quickstart

riga commented 11 months ago

Thanks for opening the request @StFroese !

Any suggestions what the default config should include? I was thinking about

law quickstart
    --directory / -r DIR  ->  directory where template is created
    --no-tasks            ->  do not write dummy tasks
    --no-config           ->  do not write a dummy config
    --no-setup            ->  do not write a setup.sh file
DIR/
├─ law.cfg
├─ setup.sh
└─ tasks/
   ├─ __init__.py
   └─ tasks.py
; law configuration example
; for more info, see https://law.readthedocs.io/en/latest/config.html

[modules]
; the task modules that should be scanned by "law index"

[logging]
; log levels mapped to python modules
law: INFO
luigi-interface: INFO
gfal2: WARNING

[luigi_core]
; luigi core settings
local_scheduler: True
scheduler_host: 127.0.0.1
scheduler_port: 8080
parallel_scheduling: False
no_lock: True
log_level: INFO

[luigi_scheduler]
; luigi scheduler settings
record_task_history: False
remove_delay: 86400
retry_delay: 30
worker_disconnect_delay: 30

[luigi_worker]
; luigi worker settings
ping_interval: 20
wait_interval: 20
check_unfulfilled_deps: False
cache_task_completion: True
keep_alive: True
force_multiprocessing: False
StFroese commented 11 months ago

Hi @riga, lgtm but maybe include the dummy task module in the config by default. Another question I have: I haven't used law yet and I'm starting right now. Why are the tasks in a folder called tasks? In luigi the combination of Tasks and Targets are called Workflows but here workflows are something else, right?

StFroese commented 11 months ago

So what I mean is basically that it would make sense to me to have a file with different tasks and targets which depend on each other and build a workflow placed in a file called my_workflow.py inside a folder called workflows

StFroese commented 11 months ago

I'd rather have the Workflow classes called TaskTree :)

riga commented 11 months ago

I'd rather have the Workflow classes called TaskTree :)

I see the appeal of that, but I fear at this point, a renaming would disrupt many existing setups. Also, the term "workflow" in luigi is defined rather loosely, so the overlap is perhaps smaller than you might think :)

So what I mean is basically that it would make sense to me to have a file with different tasks and targets which depend on each other and build a workflow placed in a file called my_workflow.py inside a folder called workflows.

Maybe the confusion resides here. The directory where tasks are located does not need to be called "tasks", nor do tasks have to be defined in just one directory or in just one repository. In the instance above, it was just an example, but we should perhaps call it "my_package" to convey that this directory should most likely used like a normal python package that happens to define tasks.

In this sense, with neither bare luigi or law, one never starts a "workflow" (in your definition of "workflow" above), but just a single task that, through its recursive dependencies, creates a task tree (yep, luigi salso calls it "task tree" sometimes 👀) represented by a DAG which is then processed by luigi. How this tree looks like is fully determined by how your tasks are defined, by runtime parameters or environment variables, and - of course - by what needs to be processed, i.e., which tasks were already complete in the first place. Therefore, at least to me, it doesn't make too much sense to call tasks in a directory a "workflow". For instance, depending on a parameter, the task tree / DAG might look completely different. But just because the tasks for that are defined at the same place, one still wouldn't consider both trees to be the same "workflow".

pfackeldey commented 11 months ago

Regarding the law quickstart option, it may be more appropriate to have a cookie/copier-template (see: https://github.com/copier-org/copier). This would allow for more different types of setups (e.g. similar to the different law-examples), in case that might be needed at some point...

Apart from that, I like the simplicity of law quickstart :)

StFroese commented 11 months ago

@riga thanks for the explanation, I thinks it's alright for me then. I guess no one stops me renaming my folder anyways haha :)