rhfogh commented 3 months ago

The below would give one path from XML to YAML configuration. It would still require some major changes to move to YAML files and change system, and most likely we would have to do it with a code camp. Still, it should hopefully make the transition easy enough to actually get it done

1 Agree on the goal

The HardwareObjectYaml currently has the following functionality:

1- All properties and contained objects are accessed by Python obj.attr syntax
2- Requirement for all properties and roles to be pre-defined in the class code.
3- Only HardwareObjects count as contained objects; complex data structures like dictionaries or lists count as properties.
4- name is a property and is identical to the role name that points to the object. It should be made read-only.
5- We have the properties all_objects_by_role (ordered dictionary), all_roles (list), and procedures (ordered dictionary)
6- The replace_object function allows you to replace a contained object at runtime.
7- There are placeholders for init() and _init() functions

Would we want more than that? We should agree on that before we start. Once we are all agreed what functionality we want, we should put it into the HardwareObjectYaml class straight away.

2 Goal v. existing HardwareObject functionality

Below is a list of existing HardwareObject functionalities, with some (my) opinions on their suitability.

The biggest question is whether we want to retain point 2), prohibiting 'on-the-fly' properties (I would vote that we should). One possible workaround here would be to retain point 2) but to have an extra_attributes dictionary property.
get_property and get_properties (from XML-configured objects) do not really make sense if you can't add new properties at runtime.
Personally I think that the current get_object_by_role is not needed. You can get the object by direct access or through all_objects_by_role , and the behaviour of the get_object_by_role, of searching recursively in contained objects to get an object with the right name, does not strike me as useful. But others might think differently.
The ability to treat a HardwareObject as a list or dictionary (or both) of contained HardwareObjects again strikes me as redundant and confusing.
The functions has_object and get_objects would no longer be needed. They are only relevant for objects that are not HardwareObjects but complex data structures, and those will be treated as properties.
HardwareRepository.get_hardware_object I think should definitely be abolished. It works off configuration file names and allow you to hardcode file names into source code.
Beamline.get_hardware_object gets a HardwareObject by 'dotted path', as in Beamline.get_hardware_object("detector.distance") This is a harmless convenience function, but with YAML-configured objects you could also do this directly, as myBeamline.detector.distance. Possibly we might have a similar function based on the Python operator.attrgetter, that would return eitherHardwareObjects or properties and return None if one of the intermediate objects are None.
Functions set_property, add_object, add_reference, get_roles, get_xml_path, set_name, objects_names are easily replaced or not needed outside the loading machinery.
Any machinery to track file paths or XML could not be ported.
Do we need support for pickling? E.g. __get_state__or __set_state__?
What are user_file_directory and set_user_file_directory used for?

3. Adapting `HardwareObject` class for transition

We would need to implement YAML-class functionality 1, 3, 4, 5, and 7 in the HardwareObject class. This would not require any changes to loading or existing functionality, or class-by-class changes. The biggest change would be to allow contained objects to be accessed by obj.attr syntax, which could be done by changing the way of accessing the existing data structures (unless we find it simpler to change the structures themselves) The handling of complex non-HardwareObject objects (get_objects etc.) would require changing, also in the classes using them, but there are few examples of those.

4. Deprecating obsolete features

We can then add a Depreciation Warning to all those features we want to get rid of, and clean up the code bit by bit, without changing the configuration files (much)

5. Write out YAML config files

At this point is should be possible to write out the loaded configuration in YAML format. We would still be missing the comments, so that would have to be done by hand.

6. Replace XML files with YAML files

Now we just need to delete the old XML files, replace them with the new YAML files, and put back the comments. And we are done.

marcus-oscarsson commented 2 months ago

The biggest problem we have in my opinion is to come up with a way to achieve a smooth transition between the two formats. I think what you outline above seems reasonable, if I understand it correctly it would mean that we need to:

Set a time frame for the transition period.
Identify and add depreciation warnings to methods we no longer will support after the given time
Harmonize the API between YAML and XML configured hardware objects so that we become independent of the format used.
Find a way to transition from XML to YAML for one hardware object at a time without breaking the inheritance chain.

It would be quite nice if we could use something like Pydantic to specify/model the configuration so that each HardwareObject have a configuration schema (Pydantic object). The configuration could then be parsed and validated through this schema. The configuration data can then either be available through for instance self._config or in some other way through convention be added to the HardwareObject.

rhfogh commented 2 months ago

Your understanding is the same as mine. Just one additional point is that some XML-object functions cannot really be supported in YAML, and (many) others would likely be slated for deprecation.

The Pydantic idea makes sense. But is there a way to translate it to Python attributes? I really like the idea that my linter can tell me whether someHWO.some_property is actually defined for the (possibly mock) HWO I am coding to. It is not only the configuration files, but also the code that needs validation. The way things are now, the top layer of properties, the direct attributes of the HWO, are checked on load.

marcus-oscarsson commented 2 months ago

Your understanding is the same as mine. Just one additional point is that some XML-object functions cannot really be supported in YAML, and (many) others would likely be slated for deprecation.

Sure we need to find some reasonable common ground and the aim is to work towards YAML and hopefully also improve on the current situation. I agree that we need to deprecate most XML features, perhaps the more the better :)

The Pydantic idea makes sense. But is there a way to translate it to Python attributes? ...

Not sure there is a good way to perform such a mapping or translation without it being awkward. The Pydantic object though is "well defined" and your editor/linter would know how to deal with it. One could imagine, perhaps, some sort of linking so that all roles defined in the configuration object gets assigned to the roles defined on the HardwareObject. But I'm not sure that would be very convenient but I'm open to the idea. That being said, I think we should keep it as simple as possible :)

rhfogh commented 2 months ago

If we have Pydantic schemas to define the configuration, how would I go about checking that a given attribute that I wanted to access was actually defined in the scope I needed? I.e. globally, if in a mock object or general-level code (like GPhL uses)? Or specifically for a given site or beamline (if I was reviewing a PR that contained a new attribute)? It is a lot easier if the linter can show it up, than if you need to cross-check by hand.

marcus-oscarsson commented 2 months ago

Just as an example its probably not the best one, but imagine that you have something like this:

from pydantic import BaseModel, Field

class DetectorConfigModel(BaseModel):
    name: str = Field("", description="Detector name")
    detector_distance: Role = Field("detector_distance", description="Detector name")

class SomeDetector(AbstractDetector):
    CONFIGURATION_SCHEMA = DetectorConfigModel

    def __init__(self, name):
       self._config: Optional[DetectorConfigModel] = None
      .....

Like this both you and the linter would be able to know exactly what the configuration is

rhfogh commented 2 months ago

Yes, but how would I code it? Currently I would do (in Yaml configuration)

detector_width = HWR.detector.widht

and my linter would give me an error because I spelled 'widht' wrong. How would I get hold of the width, and how would my linter know whether the attribute existed? Hopefully there is a simple answer,- I am just not sure what it is.

marcus-oscarsson commented 2 months ago

Something like this then I guess.

class DetectorConfigModel(BaseModel):
    name: str = Field("", description="Detector name")
    width: float = Field(....)
    detector_distance: Role = Field("detector_distance", description="Detector name")

HWR.detector._config.width

marcus-oscarsson commented 2 months ago

I guess in the case you need to have width as a public attribute we need to add a property

@property
def width(self) -> float:
   return self._config.width

rhfogh commented 2 months ago

That sounds quite reasonable, I have no problem with it. One might call it 'config' instead of '_config', so you can more honestly access it from the outside, and at worst (though one might want to avoid it) one could write the properties. We would have to see if this would be settable, and if it should be.

Anyway, problem solved. Let us do it that way. Sorry for the quibbling, but I did not know (and now I do).

It would mean doing more rewriting also for the YAML config class (which could take a bit longer), but it is worth that for getting it right.

marcus-oscarsson commented 2 months ago

Very happy to have the discussion :), I still don't know all the details my self its mostly ideas. I think it would be nice to have some sort of ambition for the may code camp, if its decided that this will be one of the topics. For instance it would in that case be nice if we could leave the code camp with a design and perhaps even some sort of early implementation.

I'm sure the others also have plenty of good ideas

rhfogh commented 2 months ago

Not sure how much time I will have - there is the MXLIMS work which also has top priority and which also needs presenting and discussing at MAX IV - but I shall try to get as far as I can with a proposal in advance of the meeting. Ideally we would have the draft all ready so we could start applying it, but that may well be too ambitious

marcus-oscarsson commented 1 month ago

The MXCuBE meeting and the code camp is approaching and its soon time to have a some thoughts about and discuss this (XML to YAML configuration). Both @rhfogh and myself @marcus-oscarsson have outlined some ideas above and perhaps one of the more difficult issues would be how to gradually go from XML to YAML with as little issues as possible.

One idea expressed above is to introduce a config object that is a Pydantic model. The source of the data can be either XML or YAML. This object and its utility functions would have the same API for both formats and would live along side with the old configuration system (until replaced).

This at least implies (perhaps I'm omitting something):

Develop the "config" object solution for YAML and XML formats (and make an abstraction for possible other future formats.
Adding a deprecation warning for all other of means accessing the current configuration.
This would be helped by adding documentations and tests for the configuration of current hardware objects.

A possible way to approach this would be to:

Add the new "config" object infrastructure
Identify a set of HardwareObjects to begin with and and configuration tests and documentation
Slowly move the identified HardwareObjects to YAML

What do you think about the above, could it work, do you have other ideas ?

rhfogh commented 1 month ago

Time is getting rather short. So this is what I would try to do before Lund:

Change both the Yaml-configured and the XML-configured base class so the actual data are stored in an attribute called .config. For now I shall just use a random empty object without bothering about Pydantic, initialisation, or validation.
Change the Python code so that the functions used in either base class will work on both, as they stand, and put in deprecation warnings.
Try to remove calls to functions that cannot be supported on both sides.
If I have time try to change the loading mechanisms so we can load either kind of configuration file, and merge the two superclasses into one.

We can then later look at adding in Pydantic, getting rid of unsupported and deprecated functions, automatically writing out config files that match the new way we want these to look, and putting them in instead of the preceding set

mxcube / mxcubecore

Transfer from XML to YAML HardwareObject configuration #897

1 Agree on the goal

2 Goal v. existing HardwareObject functionality

3. Adapting `HardwareObject` class for transition

4. Deprecating obsolete features

5. Write out YAML config files

6. Replace XML files with YAML files

mxcube / mxcubecore

Transfer from XML to YAML HardwareObject configuration #897

1 Agree on the goal

2 Goal v. existing HardwareObject functionality

3. Adapting HardwareObject class for transition

4. Deprecating obsolete features

5. Write out YAML config files

6. Replace XML files with YAML files

3. Adapting `HardwareObject` class for transition