Closed sapetnioc closed 2 years ago
I did not noticed that this issue was closed.
A few thoughts / proposals for the configuration in Capsul v2 / v3:
In a general way we need to handle:
In the current implementation (v2):
CapsulEngine
/ Settings
rely on a database (populse_db) but apart from storage considerations, this is an "implementation detail" and the database API is not really exposed to the user / programmer.engine = capsul_engine()
with engine.settings() as session:
config = session.config('spm', 'global', 'spm12-standalone')
if config is None:
session.new_config('spm', 'global', {'config_id': 'spm12-standalone',
'directory': something})
else:
config.directory = something
6 lines of code just to set one parameter.
A dict / json representation would be more handy, but using "just" a dictionary would be "too free", not allowing to define "official" variables, with types and documentation, and forbid mistakes such as setting a wrong key in the dict.
The Controller
API seems quite appropriate for it: it allows to declare, type, and document variables, it can be converted from / to dicts and thus saved in json format, has a generic GUI which allows to edit all parameters (even if dedicated GUIs would sometimes look better for end users but this is not forbidden). The only problem is the levels where several configs can exist: here the keys are not known in advance and should be open. But we can use the OpenKeyController
at this level. In addition we may use the notification mechanisms to react when a config parameter is set or changed.
Thus the manipulation API would look like this:
settings.global.spm.spm12_standalone.executable = '/usr/local/spm12-standalone'
print(settings.export_to_dict())
>>> {'global': {'capsul.engine.module.spm': {
'spm12_standalone': {'config_id': 'spm12_standalone',
'config_environment': 'global',
'directory': '/usr/local/spm12-standalone'}}}
del settings.spm.spm12_standalone
print(settings.export_to_dict())
>>> {"global': {'capsul.engine.module.spm': {}}}
This looks more similar to what did exist in the former StudyConfig
, but with the missing levels (environment, several configurations...). In the example above, both global
(environment) and spm12_standalone
(config ID in the SPM module) are free variable names.
We may make this API work both in capsul v2 and v3.
I totally agree that controllers are very good candidates for internal implementation. But we must define the whole configuration ecosystem, not only the configuration itself. For instance decide how and where modules are written, how to list available modules, how to combine several configurations, how to verify the validity of a configuration, etc. To list all the topic we must address, we must keep in mind the different potential user profiles of the configuration system. I am thinking of 4 profiles :
Using the YAQL selection language can work with controllers transformed in dict
. In the following code, I copied the data and query from the first example found in YAQL doc.
This may be an option to express a kind of selection of configuration. But this is not yet clear to me how to do it.
# -*- coding: utf-8 -*-
import yaql
from soma.controller import Controller
from soma.undefined import undefined
# Without dict iheritance, YAQL raises an error
class DictController(Controller, dict):
def get(self, value, default=None):
return getattr(self, value, default)
def __getitem__(self, name):
value = getattr(self, name, undefined)
if value is undefined:
raise KeyError(name)
return value
def items(self):
for field in self.fields():
yield (field.name, getattr(self, field.name))
def values(self):
for field in self.fields():
yield getattr(self, field.name)
def keys(self):
for field in self.fields():
yield field.name
def __contains__(self, key):
return self.field(key) is not None
def __len__(self):
return len(self.fields())
def __iter__(self):
for field in self.fields():
yield field.name
class CustomerCity(DictController):
city: str
customer_id: int
class Order(DictController):
order_id: int
item: str
quantity: int
class Customer(DictController):
customer_id: int
name: str
orders: list[Order]
class DB(DictController):
customers_city: list[CustomerCity]
customers: list[Customer]
json_db = {'customers_city': [{'city': 'New York', 'customer_id': 1},
{'city': 'Saint Louis', 'customer_id': 2},
{'city': 'Mountain View', 'customer_id': 3}],
'customers': [{'customer_id': 1,
'name': 'John',
'orders': [{'order_id': 1, 'item': 'Guitar', 'quantity': 1}]},
{'customer_id': 2,
'name': 'Paul',
'orders': [{'order_id': 2, 'item': 'Banjo', 'quantity': 2},
{'order_id': 3, 'item': 'Piano', 'quantity': 1}]},
{'customer_id': 3,
'name': 'Diana',
'orders': [{'order_id': 4, 'item': 'Drums', 'quantity': 1}]}]}
db = DB()
db.import_json(json_db)
engine = yaql.factory.YaqlFactory().create()
expression = engine(
'$.customers.orders.selectMany($.where($.order_id = 4))')
order = expression.evaluate(data=db)
print(order)
In the model we discussed offline, we would base the configuration on Contoller
, and dict[str, Controller]
. However we face a validation problem: pydantic typechecking is too weak, and basically inoperant for dicts (or lists):
from soma.controller import Controller
c = Controller()
c.add_field('toto', dict[str, Controller], default_factory=dict)
c.toto[12] = 'babar'
does not complain in any manner, whereas neither the dict key nor the value have the required types. Pydantric only validates direct affectations, when the full dictionary is set:
c.toto = {12: 'babar'}
actually raises a ValidationError
.
But in our multiple level configuration, we never assign the full dict at once. Thus we have no validation at all, and we can set any garbage in the config.
Otherwise it's possible to use OpenKeyController
instead of dicts, which offers a stronger validation.
I have a question about configuration API. First, let's assess I understand the current API. Configuration is a hierarchy:
ConfigurationLayer
class, corresponds to the storage location of the configuration (user, application and site). These configurations are merged before use, therefore this level may be ignored by user.EngineConfiguration
class) corresponds to a specific computing resource. To date, all fields of an EngineConfiguration
must correspond to a module name (e.g. spm
, fsl
, etc.)EngineConfiguration
attributes) can contain several configurations for a module (via OpenKeyController
class). Each configuration must be given a name to access it internally.ModuleConfiguration
derived class specific to the corresponding module. This third level allows to configure several versions of the same module (e.g. spm 8 and spm 12) Now I need to store configuration for datasets. This should be a kind of dict[str, DatasetConfig]
where DatasetConfig
would contain, at least, a directory name and a organization schema name (i.e. 'bids-1.1'
or 'brainvisa-5.0'
). A dataset configuration is tied to a computing resource (validity of the path) but cannot have several version such as modules. Is EngineConfiguration
the right place to put this configutation ? If yes, could it be implemented by simply adding a field dataset
of type OpenKeyController[DatasetConfig]
?
Yes, that's it.
I this context, wouldn't it be just a matter of implementing DatasetConfig
as a config module (ModuleConfiguration
subclass) ? I think this would just match the requirements you mention: the EngineConfiguration, tied to a computing resource, would have a field dataset
, which would automatically be exactly an OpenKeyController[DatasetConfig]
.
You can now look "visually" at the config structure using this code (in ipython --gui=qt
):
from capsul.qt_gui.widgets import config_gui
from soma.qt_gui.controller import ControllerWidget
from capsul.config import ApplicationConfiguration
c = ApplicationConfiguration('blop')
cw = ControllerWidget(c)
cw.show()
Yes, you are right. I missed this possibility because it put in the same bag what seemed to me to be different concepts:
But this is only a matter of internal documentation, user documentation and GUI. What is important is to fix the configuration access API and using a dataset
configuration module is the easiest way to provide the right API.
Yes sure, but we have not considered separating different aspects of configuration, and I'm not sure it is worth the complication: even for users, they would need to look at several places to enter their configuration, if we do, thus it is simpler as it is done now. Moreover there are likely more than 2 "concepts": software, datasets, but also possibly computing resource configuration, completion, etc. Right now I don't feel a strong need to separate them, but if we do we will have to add a config level, or categories.
Project configuration must take into account several things that make it necessary to completely rethink Capsul configuration. One important feature of the future new system is the ability to define requirements in processes (a requirement such as "I need SPM in version 12"). Another feature that we would like to be possible (at least in the future) would be to make it possible to configure several versions of a software (e.g. SPM 8 and 12) and to allow pipelines to combine processes using different versions. This will become possible in v3.0 because each process will run in a separate job (with potentially a different configuration). But the configuration system and the process dependencies system must be designed specifically to allow such features.