Closed moltob closed 5 years ago
clean
wont be able to access getargs
because clean
does not follow normal execution task's dependencies... but i understand your problem. You just need to save the data in the task that will use it.
Can you provide a simple example? I guess this would be a nice addition to (planned) HOW-TO section in docs.
import doit.tools
DOIT_CONFIG = dict(
verbosity=2,
default_tasks=['create_object'],
)
def task_determine_identifier():
"""Determine an identifier to be used by other tasks."""
def do_complex_stuff():
# Do some very complex or lengthy operation, like talking to an external service.
...
return dict(identifier='0815')
return dict(
actions=[do_complex_stuff],
)
def task_create_object():
"""Create object with ID."""
# We don't know the identifier here (during task creation).
def action(ident):
print('Created object with ID', ident)
def clean():
# How to get identifier here?
print('Deleted object with ID ???')
return dict(
actions=[action],
clean=[clean],
getargs=dict(ident=('determine_identifier', 'identifier')),
)
The second task needs values from the first task to define details of the action and also details of the corresponding clean.
Your example is pretty much what I expected. :smile:
Running your example I found a bug on result_dep
(used implicitly by getargs
): fixed https://github.com/pydoit/doit/commit/3873759424a5a9559d7b6c8660bd72e0091bb0fc
I thought it was possible but actually currently not possible to do your use-case without changes on doit.
The problem is that dep_manager
needs to be somehow exposed to task's action execution.
We need to define an API... For uptodate
functions, it can receive a magic values
parameter that doit
will inject saved values. We could use same solution here or maybe something more generic where whole dep_manager
is exposed. What do you think? (Not sure you are familiar enough with doit internals)
I got it working with a quick and dirty patch...
diff --git a/doit/cmd_clean.py b/doit/cmd_clean.py
index 00d8d93..82845eb 100644
--- a/doit/cmd_clean.py
+++ b/doit/cmd_clean.py
@@ -57,6 +57,7 @@ class Clean(DoitCmdBase):
for task in tasks:
if task.name not in cleaned:
cleaned.add(task.name)
+ task.dep_manager = self.dep_manager
task.clean(self.outstream, dryrun)
if forget_tasks:
self.dep_manager.remove(task.name)
DOIT_CONFIG = dict(
verbosity=2,
default_tasks=['create_object'],
)
def task_determine_identifier():
"""Determine an identifier to be used by other tasks."""
def do_complex_stuff():
# Do some very complex or lengthy operation, like talking to an external service.
...
return dict(identifier='0815')
return dict(
actions=[do_complex_stuff],
)
def task_create_object():
"""Create object with ID."""
# We don't know the identifier here (during task creation).
def action(ident):
print('Created object with ID', ident)
# save value here
return {'ident': ident}
def clean(task):
# How to get identifier here?
# FIXME: this line requires a patch on `clean` command
# to expose dep_manager
values = task.dep_manager.get_values(task.name)
# Solution: get value saved
print('Deleted object with ID {}'.format(values.get('ident')))
# For getargs, you would not need to save avlue again
dep_result = values.get('_result:determine_identifier', {})
print('Deleted object with ID {}'.format(dep_result.get('identifier')))
return dict(
actions=[action],
getargs=dict(ident=('determine_identifier', 'identifier')),
clean=[clean],
)
I do like the approach where the dep_manager
is available to a task. Do you think it would make sense to not do this for clean only, but also for actions, just to be a bit less clean-specific? Every now and then I run into situations, where I'd like to get some data from the doit data base, also in actions and having access to it would be great.
What is kind of dangerous is to expose a mutable version of it, as users might start messing around and update doit state data on their own. There are various ways to prevent that (pass a copy, pass a frozen object, ...).
If you don't like adding the dep_manager
to task, an alternative API could be to pass it into the action/clean call as optional argument, just like task
or dependencies
.
Thinking over the subject... I agree about making dep_manager
generally available would be a good idea. Like it or not, the truth is that it is a global. So lets treat it like that. So I would suggest adding doit.get_dep_manager()
. That would be available everywhere... even when loading tasks, and would make it possible to save data that is not specific to tasks...
I dont care about it being dangerous, following python philosophy, hope you dont shoot yourself in the foot.
But while giving user the power to use anywhere to do advanced stuff I would like to keep doit
(mostly) no-api design. Given a magic values
parameters to clean the same as it is available on uptodate()
would solve your original use-case, and would be simpler for a user (does not have not learn DepManager API).
Actually I guess having a magic values
for all actions would solve another problem that one task can use computed values from another task but not values saved in another action on its own task.
So in conclusion I propose 2 changes:
1) adding a global get_dep_manager()
2) supporting a magic values
parameter to actions (including clean
)
Would be great if you could work on a patch for any of them. Both changes should be straightforward to implement.
In fact your first suggestion is even more attractive, as I also have the use case where during task generation I already have to know some previously computed values.
I'll try suggesting a PR for that throughout the next couple of days.
I looked at the source and providing a global doit.get_dep_manager
seems to require some refactoring to prevent code duplication, so I want to run this by @schettino72 to get some guidance first.
The idea would be to lazily call code like this from the new free function:
But here I need access to various elements to be able to call the class:
params
(the doit configuration, plus command line args, ...)db_class
, basically looked up with DoitCmdBase.get_backends()
checker_cls
, basically looked up with DoitCmdBase.gt_checker_cls()
The methods above then require more configuration parameters and infrastructure elements like the plugin dictionary (does it?).
It looks like much construction logic required to instantiate the Dependency
class is currently woven into the command classes. So I am wondering about your preferred way of reusing that logic. I see some options:
DoitCmdBase
so it inherits all the required goodies.DoitCmdBase
to a new module of free functions.The cleanest (to me) seems to be the second option but it also has the biggest impact and risk of things going wrong with this change. The first approach looks ok and code duplication is prevented as well.
Please let me know how you would like it done.
Thanks. Always good to discuss design first...
When I mention the dependency manger being a global, I mean that it would not be instatiated in any other place but on Command execution. So the implementation is just like a registry...
Quick hack:
--- a/doit/__init__.py
+++ b/doit/__init__.py
@@ -40,4 +41,8 @@ def get_initial_workdir():
"""working-directory from where the doit command was invoked on shell"""
return loader.initial_workdir
+
+GLOBALS = {}
+def get_dep_manager():
+ return GLOBALS.get('dep_manager')
+
--- a/doit/cmd_base.py
+++ b/doit/cmd_base.py
@@ -469,6 +469,8 @@ class DoitCmdBase(Command):
# dep_manager might have been already set (used on unit-test)
self.dep_manager = Dependency(
db_class, params['dep_file'], checker_cls)
+ import doit
+ doit.GLOBALS['dep_manager'] = self.dep_manager
This would be enough for your original use-case. But the DepManger would not be available when executing task creators.
As we discussed it would be nice if this is also available when loading tasks.
This will require changes in TaskLoader
as it currently has a single function load_tasks()
that returns both config values and task instances. We need to modify it so that it first only gets the config values. Then on DoitCmdBase
we would create an instance of DepManager
. Then finally actually load the tasks...
So my suggestion of what exactly needs to be done is:
cmd_base
imports __init__.py
, on my projects I forbid this as it would create a circular reference.Hope it is clear.
Sorry for not responding for a while, I was offline due to German vacation time. 😃
I think I understood your suggestion. Essentially you want to split these lines:
dodo_config
.params
dict from it.dep_manager
.dep_manager
is available in the registry.Please let me know in particular about adding the registry concept to doit. I'll do it in a clean way without circular dependencies but I just want to make sure you agree to this new way of storing stuff for user tasks.
Looking into the split, I have yet another question. TaskLoader.load_tasks()
might get some cleanup on the way.
Since the TaskLoader._load_from()
method loads tasks and configuration, the resulting call sequence has slightly confusing method names: SomeConcreteTaskLoader.load_tasks()
-> TaskLoader._load_from()
-> loader.load_tasks()
, loader.load_doit_config()
.
TaskLoader._load_from()
must do a switch on the type of namespace
, even though that namespace is passed in by the derived classes, which already know precisely, what this is. IMHO the code is not taking full advantage of the class hierarchy.
I suggest an initial refactoring like this:
TaskLoader
gets the two functions you wanted to have anyway: load_doit_config
and load_tasks
.member_dict
or similar. The derived classes then implement that property appropriately. Type disambiguation is no longer required.TaskLoad.load_tasks()
than does only exactly what it sounds like. I have seen in many places in code & tests, that the returned configuration dict was thrown away anyway.This will pave the road for the updated call sequence in the command base class. If you agree, I'd do the refactoring and test adaptation first in a separate PR.
Please let me know.
Looking further at the docs and sample code, it seems that you might want to keep the original signature:
class SomeLoader(TaskLoader):
def load_tasks(...) -> Tuple[List[Task], Dict[str, Any]]:
...
If that is the case, the change is much smaller. I'll stand by for your comments now.
Final comment: I think your suggestion to split config and task loading is not quite consistent with keeping the TaskLoader.load_tasks()
API in place with the aforementioned return values. I believe we can't have both...
Open-source is volunteer work, no worries about time.
Store it in a new (correct?) global registry dictionary. This is a new doit concept.
Just be clear. I mentioned "registry" as a concept...
Do not need to create a class for that, a plain dict
can do the job
as in my quick patch.
Actually doit_cmd.get_var()
has a similar implementation.
Note the special care for windows multi-processing.
_load_from
wont be used anymore, maybe add a get_members()
To keep backward compability
if hasattr(self.loader, 'load_doit_config'):
# new api
members = self.loader.get_members()
config = self.loader.load_doit_config()
... # other stuff
dep_manger = Dependency()
tasks = self.loader.load_tasks()
else:
... # like current code
# of course in this backward compatible dep_manager wont be available
# while loading tasks. I just dont want to break code when someone upgrades doit
Maybe we could be more explicit how to trigger old/new api.
Specially because load_tasks
is present on both but with different signatures... Or maybe get a new name instead of load_tasks
? load_tasks_only()
You can decide what you think works beter.
Ok, thanks for the feedback. Since you are open for smaller API changes, how do you like the following design, where I:
class TaskLoader:
"""Interface of task loaders."""
cmd_options = ()
def __init__(self):
# list of command names, used to detect clash of task names and commands
self.cmd_names = []
self.config = None # reference to config object taken from Command
def load_doit_config(self, opt_values):
"""Load doit configuration."""
raise NotImplementedError()
def load_tasks(self, cmd, opt_values, pos_args):
"""Load tasks.
:return: (List[Task])
:param cmd: (doit.cmd_base.Command) current command being executed
:param opt_values: (dict) with values for cmd_options
:param pos_args: (list str) positional arguments from command line
"""
return NotImplementedError()
class NamespaceLoader(TaskLoader):
"""Load tasks from namespace, mapping identifiers to items.
``namespace`` must be given at creation time or set before ``load_tasks`` or
``load_doit_config`` are called.
"""
def __init__(self, namespace=None):
super().__init__()
self.namespace = namespace
def load_doit_config(self, opt_values):
return loader.load_doit_config(self.namespace)
def load_tasks(self, cmd, opt_values, pos_args):
return loader.load_tasks(self.namespace, self.cmd_names, cmd.execute_tasks)
class ModuleLoader(NamespaceLoader):
"""Load tasks from a Python module."""
def __init__(self, module):
super().__init__(dict(inspect.getmembers(module)))
class DodoFileLoader(NamespaceLoader):
"""Delayed loading of tasks from a dodo file."""
def _ensure_namespace(self, opt_values):
"""Lazily load namespace from dodo file per config parameters."""
if self.namespace is None:
module = loader.get_module(
opt_values['dodoFile'],
opt_values['cwdPath'],
opt_values['seek_file'],
)
self.namespace = dict(inspect.getmembers(module))
def load_doit_config(self, opt_values):
self._ensure_namespace(opt_values)
return super().load_doit_config(opt_values)
def load_tasks(self, cmd, opt_values, pos_args):
self._ensure_namespace(opt_values)
return super().load_tasks(cmd, opt_values, pos_args)
The fact that opt_values
is an argument to the API functions is not super-nice, as here the dodo-file specific implementation is leaking into the abstraction. However, I am not sure it is worth now to spend the extra effort.
Haven't run this yet, so consider it pseudo-code, but would you agree to this design?
The fact that opt_values is an argument to the API functions is not super-nice, as here the dodo-file specific implementation is leaking into the abstraction.
opt_values
is not specific to dodo-file. Any TaskLoader
(implemented as a plugin) can take extra parameters from command-line arguments...
By the way, pos_args
is there because you could potentially implement a more efficient loading mechanism, but I never got to implement that.
Looking back, maybe I would pass opt_values
in __init__
or some other initialization method. But I guess not really worth to change it.
I guess NamespaceTaskLoader
is not necessary as their methods are one line. Just add one more layer for no benefit.
Can you rename the new implementationTaskLoader
to BaseTaskLoader
and keep the old one for backward compatibility?
I guess I was not clear enough. If opt_values
could be passed on some kind of initialiazation method, then NamespaceTaskLoader
would make sense, but right now its use is too limited. If you want to add a third method for initialization I would be ok with that.
Changing __init__
would be harder as it is called by plugin system...
Regarding the necessity of NamespaceTaskLoader
: Where do you suggest to integrate its code? Folding it into TaskLoader
makes assumptions about TaskLoaders that for instance the ones used in the tests do not fulfill, since they are not at all namespace based. If I would implement a task loader from a database, I might not want to have the namespace
code and member clutter my class.
But of course, it's possible and no big flaw. Let me know if you want to do it that way.
I guess this is implementation detail. you can make a call...
My point was that the reason to distinguish between TaskLoader
and NamespaceLoader
is already shown in your test code: Not all task loaders are namespace based.
But this is probably not be worth struggling over. I'll collapse and get rid of the NamespaceTaskLoader
.
Thanks for the comment regarding plugins: This means any base class to be meaningfully reused by user-code must not add additional arguments to __init__
. I was slightly misled by your existing specializations, which do have additional arguments like mod_dict
. But of course those are rather internal implementations, which are not meant to be used as base-class for more, plugin-based loaders. In Java speak they would be final
...
How about this:
_load_from
being called by ModuleTaskLoader
or DodoTaskLoader
. IMHO they should not... 😈 TaskLoader
not shown, it's your original class.
class TaskLoader2(TaskLoader):
"""Interface of task loaders with new-style API.
The default implementation assumes tasks are loaded from a namespace, mapping identifiers to
elements like functions (like task generators) or constants (like configuration values).
This API update separates the loading of the configuration and the loading of the actual tasks,
which enables additional elements to be available during task creation.
"""
def __init__(self):
super().__init__()
# namespace to be searched for loadable objects, must be set by derived classes before the
# load methods are called:
self.namespace = None
def load_doit_config(self, opt_values):
"""Load doit configuration."""
return loader.load_doit_config(self.namespace)
def load_tasks(self, cmd, opt_values, pos_args):
"""Load tasks.
:return: (List[Task])
:param cmd: (doit.cmd_base.Command) current command being executed
:param opt_values: (dict) with values for cmd_options
:param pos_args: (list str) positional arguments from command line
"""
return loader.load_tasks(self.namespace, self.cmd_names, cmd.execute_tasks)
class ModuleTaskLoader(TaskLoader2):
"""load tasks from a module/dictionary containing task generators
Usage: `ModuleTaskLoader(my_module)` or `ModuleTaskLoader(globals())`
"""
def __init__(self, mod_dict):
super(ModuleTaskLoader, self).__init__()
if inspect.ismodule(mod_dict):
self.namespace = dict(inspect.getmembers(mod_dict))
else:
self.namespace = mod_dict
class DodoTaskLoader(TaskLoader2):
"""default task-loader create tasks from a dodo.py file"""
cmd_options = (opt_dodo, opt_cwd, opt_seek_file)
def _ensure_namespace(self, opt_values):
"""Lazily load namespace from dodo file per config parameters."""
if self.namespace is None:
module = loader.get_module(
opt_values['dodoFile'],
opt_values['cwdPath'],
opt_values['seek_file'],
)
self.namespace = dict(inspect.getmembers(module))
def load_doit_config(self, opt_values):
self._ensure_namespace(opt_values)
return super().load_doit_config(opt_values)
def load_tasks(self, cmd, opt_values, pos_args):
self._ensure_namespace(opt_values)
return super().load_tasks(cmd, opt_values, pos_args)
Sorry for querying your feedback so frequently, but since this is not a straight forward bugfix, I don't want to create surprises by adding personally preferred patterns that might not fit well. Once we agree on this, the implementation is almost trivial (to write and to review).
Adding an extra setup()
method. What do you think?
class TaskLoader2():
"""Interface of task loaders with new-style API.
This API update separates the loading of the configuration and the loading of the actual tasks,
which enables additional elements to be available during task creation.
"""
API = 2
def setup(self, opt_values):
pass
def load_doit_config(self):
"""Load doit configuration."""
def load_tasks(self, cmd, pos_args):
"""Load tasks.
:return: (List[Task])
:param cmd: (doit.cmd_base.Command) current command being executed
:param pos_args: (list str) positional arguments from command line
"""
class NamespaceTaskLoader(TaskLoader2):
"""
The default implementation assumes tasks are loaded from a namespace, mapping identifiers to
elements like functions (like task generators) or constants (like configuration values).
"""
def setup(self, opt_values):
self.namespace = None
raise NotImplementedError()
def load_doit_config(self):
"""Load doit configuration."""
return loader.load_doit_config(self.namespace)
def load_tasks(self, cmd, pos_args):
"""Load tasks.
:return: (List[Task])
:param cmd: (doit.cmd_base.Command) current command being executed
:param opt_values: (dict) with values for cmd_options
:param pos_args: (list str) positional arguments from command line
"""
return loader.load_tasks(self.namespace, self.cmd_names, cmd.execute_tasks)
class ModuleTaskLoader(NamespaceTaskLoader):
"""load tasks from a module/dictionary containing task generators
Usage: `ModuleTaskLoader(my_module)` or `ModuleTaskLoader(globals())`
"""
def __init__(self, mod_dict):
super(ModuleTaskLoader, self).__init__()
self.mod_dict = mod_dict
def setup(self, opt_values):
if inspect.ismodule(mod_dict):
self.namespace = dict(inspect.getmembers(self.mod_dict))
else:
self.namespace = self.mod_dict
class DodoTaskLoader(NamespaceTaskLoader):
"""default task-loader create tasks from a dodo.py file"""
cmd_options = (opt_dodo, opt_cwd, opt_seek_file)
def setup(self, opt_values):
"""Lazily load namespace from dodo file per config parameters."""
if self.namespace = loader.get_module(
opt_values['dodoFile'],
opt_values['cwdPath'],
opt_values['seek_file'],
)
Works for me. I would not add the API
attribute though. I can simply test for isinstance(loader, TaskLoader2)
. This way I don't need to add API = 1
to the original class or check hasattr
first.
I still have more suggested changes to your last version, but I think we're OK now with the general design, so I'll turn to code now. :-)
Actions are able to access doit's database, e.g. the return values of other tasks via
getargs
. As such configuration affects the action and what is done and produced, I see frequent need to also provide this to clean. But it does not seem to be possible. Is it?Thx!
Upvote & Fund