Open rhfogh opened 3 months ago
The biggest problem we have in my opinion is to come up with a way to achieve a smooth transition between the two formats. I think what you outline above seems reasonable, if I understand it correctly it would mean that we need to:
It would be quite nice if we could use something like Pydantic to specify/model the configuration so that each HardwareObject
have a configuration schema (Pydantic object). The configuration could then be parsed and validated through this schema. The configuration data can then either be available through for instance self._config
or in some other way through convention be added to the HardwareObject.
Your understanding is the same as mine. Just one additional point is that some XML-object functions cannot really be supported in YAML, and (many) others would likely be slated for deprecation.
The Pydantic idea makes sense. But is there a way to translate it to Python attributes? I really like the idea that my linter can tell me whether someHWO.some_property is actually defined for the (possibly mock) HWO I am coding to. It is not only the configuration files, but also the code that needs validation. The way things are now, the top layer of properties, the direct attributes of the HWO, are checked on load.
Your understanding is the same as mine. Just one additional point is that some XML-object functions cannot really be supported in YAML, and (many) others would likely be slated for deprecation.
Sure we need to find some reasonable common ground and the aim is to work towards YAML and hopefully also improve on the current situation. I agree that we need to deprecate most XML features, perhaps the more the better :)
The Pydantic idea makes sense. But is there a way to translate it to Python attributes? ...
Not sure there is a good way to perform such a mapping or translation without it being awkward. The Pydantic object though is "well defined" and your editor/linter would know how to deal with it. One could imagine, perhaps, some sort of linking so that all roles defined in the configuration object gets assigned to the roles defined on the HardwareObject. But I'm not sure that would be very convenient but I'm open to the idea. That being said, I think we should keep it as simple as possible :)
If we have Pydantic schemas to define the configuration, how would I go about checking that a given attribute that I wanted to access was actually defined in the scope I needed? I.e. globally, if in a mock object or general-level code (like GPhL uses)? Or specifically for a given site or beamline (if I was reviewing a PR that contained a new attribute)? It is a lot easier if the linter can show it up, than if you need to cross-check by hand.
Just as an example its probably not the best one, but imagine that you have something like this:
from pydantic import BaseModel, Field
class DetectorConfigModel(BaseModel):
name: str = Field("", description="Detector name")
detector_distance: Role = Field("detector_distance", description="Detector name")
class SomeDetector(AbstractDetector):
CONFIGURATION_SCHEMA = DetectorConfigModel
def __init__(self, name):
self._config: Optional[DetectorConfigModel] = None
.....
Like this both you and the linter would be able to know exactly what the configuration is
Yes, but how would I code it? Currently I would do (in Yaml configuration)
detector_width = HWR.detector.widht
and my linter would give me an error because I spelled 'widht' wrong. How would I get hold of the width, and how would my linter know whether the attribute existed? Hopefully there is a simple answer,- I am just not sure what it is.
Something like this then I guess.
class DetectorConfigModel(BaseModel):
name: str = Field("", description="Detector name")
width: float = Field(....)
detector_distance: Role = Field("detector_distance", description="Detector name")
HWR.detector._config.width
I guess in the case you need to have width
as a public attribute we need to add a property
@property
def width(self) -> float:
return self._config.width
That sounds quite reasonable, I have no problem with it. One might call it 'config' instead of '_config', so you can more honestly access it from the outside, and at worst (though one might want to avoid it) one could write the properties. We would have to see if this would be settable, and if it should be.
Anyway, problem solved. Let us do it that way. Sorry for the quibbling, but I did not know (and now I do).
It would mean doing more rewriting also for the YAML config class (which could take a bit longer), but it is worth that for getting it right.
Very happy to have the discussion :), I still don't know all the details my self its mostly ideas. I think it would be nice to have some sort of ambition for the may code camp, if its decided that this will be one of the topics. For instance it would in that case be nice if we could leave the code camp with a design and perhaps even some sort of early implementation.
I'm sure the others also have plenty of good ideas
Not sure how much time I will have - there is the MXLIMS work which also has top priority and which also needs presenting and discussing at MAX IV - but I shall try to get as far as I can with a proposal in advance of the meeting. Ideally we would have the draft all ready so we could start applying it, but that may well be too ambitious
The MXCuBE meeting and the code camp is approaching and its soon time to have a some thoughts about and discuss this (XML to YAML configuration). Both @rhfogh and myself @marcus-oscarsson have outlined some ideas above and perhaps one of the more difficult issues would be how to gradually go from XML to YAML with as little issues as possible.
One idea expressed above is to introduce a config object that is a Pydantic model. The source of the data can be either XML or YAML. This object and its utility functions would have the same API for both formats and would live along side with the old configuration system (until replaced).
This at least implies (perhaps I'm omitting something):
Develop the "config" object solution for YAML and XML formats (and make an abstraction for possible other future formats.
Adding a deprecation warning for all other of means accessing the current configuration.
This would be helped by adding documentations and tests for the configuration of current hardware objects.
A possible way to approach this would be to:
Add the new "config" object infrastructure
Identify a set of HardwareObjects to begin with and and configuration tests and documentation
Slowly move the identified HardwareObjects to YAML
What do you think about the above, could it work, do you have other ideas ?
Time is getting rather short. So this is what I would try to do before Lund:
Change both the Yaml-configured and the XML-configured base class so the actual data are stored in an attribute called .config. For now I shall just use a random empty object without bothering about Pydantic, initialisation, or validation.
Change the Python code so that the functions used in either base class will work on both, as they stand, and put in deprecation warnings.
Try to remove calls to functions that cannot be supported on both sides.
If I have time try to change the loading mechanisms so we can load either kind of configuration file, and merge the two superclasses into one.
We can then later look at adding in Pydantic, getting rid of unsupported and deprecated functions, automatically writing out config files that match the new way we want these to look, and putting them in instead of the preceding set
The below would give one path from XML to YAML configuration. It would still require some major changes to move to YAML files and change system, and most likely we would have to do it with a code camp. Still, it should hopefully make the transition easy enough to actually get it done
1 Agree on the goal
The
HardwareObjectYaml
currently has the following functionality:obj.attr
syntaxHardwareObjects
count as contained objects; complex data structures like dictionaries or lists count as properties.name
is a property and is identical to the role name that points to the object. It should be made read-only.all_objects_by_role
(ordered dictionary),all_roles
(list), andprocedures
(ordered dictionary)replace_object
function allows you to replace a contained object at runtime.init()
and_init()
functionsWould we want more than that? We should agree on that before we start. Once we are all agreed what functionality we want, we should put it into the
HardwareObjectYaml
class straight away.2 Goal v. existing HardwareObject functionality
Below is a list of existing
HardwareObject
functionalities, with some (my) opinions on their suitability.The biggest question is whether we want to retain point 2), prohibiting 'on-the-fly' properties (I would vote that we should). One possible workaround here would be to retain point 2) but to have an
extra_attributes
dictionary property.get_property
andget_properties
(from XML-configured objects) do not really make sense if you can't add new properties at runtime.Personally I think that the current
get_object_by_role
is not needed. You can get the object by direct access or throughall_objects_by_role
, and the behaviour of theget_object_by_role
, of searching recursively in contained objects to get an object with the right name, does not strike me as useful. But others might think differently.The ability to treat a
HardwareObject
as a list or dictionary (or both) of containedHardwareObjects
again strikes me as redundant and confusing.The functions
has_object
andget_objects
would no longer be needed. They are only relevant for objects that are notHardwareObjects
but complex data structures, and those will be treated as properties.HardwareRepository.get_hardware_object
I think should definitely be abolished. It works off configuration file names and allow you to hardcode file names into source code.Beamline.get_hardware_object
gets aHardwareObject
by 'dotted path', as inBeamline.get_hardware_object("detector.distance")
This is a harmless convenience function, but with YAML-configured objects you could also do this directly, asmyBeamline.detector.distance
. Possibly we might have a similar function based on the Pythonoperator.attrgetter
, that would return eitherHardwareObjects
or properties and return None if one of the intermediate objects are None.Functions
set_property, add_object, add_reference, get_roles, get_xml_path, set_name, objects_names
are easily replaced or not needed outside the loading machinery.Any machinery to track file paths or XML could not be ported.
Do we need support for pickling? E.g.
__get_state__
or__set_state__
?What are
user_file_directory
andset_user_file_directory
used for?3. Adapting
HardwareObject
class for transitionWe would need to implement YAML-class functionality 1, 3, 4, 5, and 7 in the
HardwareObject
class. This would not require any changes to loading or existing functionality, or class-by-class changes. The biggest change would be to allow contained objects to be accessed byobj.attr
syntax, which could be done by changing the way of accessing the existing data structures (unless we find it simpler to change the structures themselves) The handling of complexnon-HardwareObject
objects (get_objects
etc.) would require changing, also in the classes using them, but there are few examples of those.4. Deprecating obsolete features
We can then add a Depreciation Warning to all those features we want to get rid of, and clean up the code bit by bit, without changing the configuration files (much)
5. Write out YAML config files
At this point is should be possible to write out the loaded configuration in YAML format. We would still be missing the comments, so that would have to be done by hand.
6. Replace XML files with YAML files
Now we just need to delete the old XML files, replace them with the new YAML files, and put back the comments. And we are done.