mxcube / mxcubecore

Backend used by MXCuBE
http://mxcube.github.io/mxcube/
GNU Lesser General Public License v3.0
11 stars 51 forks source link

Transfer from XML to YAML HardwareObject configuration #897

Open rhfogh opened 3 months ago

rhfogh commented 3 months ago

The below would give one path from XML to YAML configuration. It would still require some major changes to move to YAML files and change system, and most likely we would have to do it with a code camp. Still, it should hopefully make the transition easy enough to actually get it done

1 Agree on the goal

The HardwareObjectYaml currently has the following functionality:

Would we want more than that? We should agree on that before we start. Once we are all agreed what functionality we want, we should put it into the HardwareObjectYaml class straight away.

2 Goal v. existing HardwareObject functionality

Below is a list of existing HardwareObject functionalities, with some (my) opinions on their suitability.

3. Adapting HardwareObject class for transition

We would need to implement YAML-class functionality 1, 3, 4, 5, and 7 in the HardwareObject class. This would not require any changes to loading or existing functionality, or class-by-class changes. The biggest change would be to allow contained objects to be accessed by obj.attr syntax, which could be done by changing the way of accessing the existing data structures (unless we find it simpler to change the structures themselves) The handling of complex non-HardwareObject objects (get_objects etc.) would require changing, also in the classes using them, but there are few examples of those.

4. Deprecating obsolete features

We can then add a Depreciation Warning to all those features we want to get rid of, and clean up the code bit by bit, without changing the configuration files (much)

5. Write out YAML config files

At this point is should be possible to write out the loaded configuration in YAML format. We would still be missing the comments, so that would have to be done by hand.

6. Replace XML files with YAML files

Now we just need to delete the old XML files, replace them with the new YAML files, and put back the comments. And we are done.

marcus-oscarsson commented 2 months ago

The biggest problem we have in my opinion is to come up with a way to achieve a smooth transition between the two formats. I think what you outline above seems reasonable, if I understand it correctly it would mean that we need to:

  1. Set a time frame for the transition period.
  2. Identify and add depreciation warnings to methods we no longer will support after the given time
  3. Harmonize the API between YAML and XML configured hardware objects so that we become independent of the format used.
  4. Find a way to transition from XML to YAML for one hardware object at a time without breaking the inheritance chain.

It would be quite nice if we could use something like Pydantic to specify/model the configuration so that each HardwareObject have a configuration schema (Pydantic object). The configuration could then be parsed and validated through this schema. The configuration data can then either be available through for instance self._config or in some other way through convention be added to the HardwareObject.

rhfogh commented 2 months ago

Your understanding is the same as mine. Just one additional point is that some XML-object functions cannot really be supported in YAML, and (many) others would likely be slated for deprecation.

The Pydantic idea makes sense. But is there a way to translate it to Python attributes? I really like the idea that my linter can tell me whether someHWO.some_property is actually defined for the (possibly mock) HWO I am coding to. It is not only the configuration files, but also the code that needs validation. The way things are now, the top layer of properties, the direct attributes of the HWO, are checked on load.

marcus-oscarsson commented 2 months ago

Your understanding is the same as mine. Just one additional point is that some XML-object functions cannot really be supported in YAML, and (many) others would likely be slated for deprecation.

Sure we need to find some reasonable common ground and the aim is to work towards YAML and hopefully also improve on the current situation. I agree that we need to deprecate most XML features, perhaps the more the better :)

The Pydantic idea makes sense. But is there a way to translate it to Python attributes? ...

Not sure there is a good way to perform such a mapping or translation without it being awkward. The Pydantic object though is "well defined" and your editor/linter would know how to deal with it. One could imagine, perhaps, some sort of linking so that all roles defined in the configuration object gets assigned to the roles defined on the HardwareObject. But I'm not sure that would be very convenient but I'm open to the idea. That being said, I think we should keep it as simple as possible :)

rhfogh commented 2 months ago

If we have Pydantic schemas to define the configuration, how would I go about checking that a given attribute that I wanted to access was actually defined in the scope I needed? I.e. globally, if in a mock object or general-level code (like GPhL uses)? Or specifically for a given site or beamline (if I was reviewing a PR that contained a new attribute)? It is a lot easier if the linter can show it up, than if you need to cross-check by hand.

marcus-oscarsson commented 2 months ago

Just as an example its probably not the best one, but imagine that you have something like this:

from pydantic import BaseModel, Field

class DetectorConfigModel(BaseModel):
    name: str = Field("", description="Detector name")
    detector_distance: Role = Field("detector_distance", description="Detector name")

class SomeDetector(AbstractDetector):
    CONFIGURATION_SCHEMA = DetectorConfigModel

    def __init__(self, name):
       self._config: Optional[DetectorConfigModel] = None
      .....

Like this both you and the linter would be able to know exactly what the configuration is

rhfogh commented 2 months ago

Yes, but how would I code it? Currently I would do (in Yaml configuration)

detector_width = HWR.detector.widht

and my linter would give me an error because I spelled 'widht' wrong. How would I get hold of the width, and how would my linter know whether the attribute existed? Hopefully there is a simple answer,- I am just not sure what it is.

marcus-oscarsson commented 2 months ago

Something like this then I guess.

class DetectorConfigModel(BaseModel):
    name: str = Field("", description="Detector name")
    width: float = Field(....)
    detector_distance: Role = Field("detector_distance", description="Detector name")

HWR.detector._config.width
marcus-oscarsson commented 2 months ago

I guess in the case you need to have width as a public attribute we need to add a property

@property
def width(self) -> float:
   return self._config.width
rhfogh commented 2 months ago

That sounds quite reasonable, I have no problem with it. One might call it 'config' instead of '_config', so you can more honestly access it from the outside, and at worst (though one might want to avoid it) one could write the properties. We would have to see if this would be settable, and if it should be.

Anyway, problem solved. Let us do it that way. Sorry for the quibbling, but I did not know (and now I do).

It would mean doing more rewriting also for the YAML config class (which could take a bit longer), but it is worth that for getting it right.

marcus-oscarsson commented 2 months ago

Very happy to have the discussion :), I still don't know all the details my self its mostly ideas. I think it would be nice to have some sort of ambition for the may code camp, if its decided that this will be one of the topics. For instance it would in that case be nice if we could leave the code camp with a design and perhaps even some sort of early implementation.

I'm sure the others also have plenty of good ideas

rhfogh commented 2 months ago

Not sure how much time I will have - there is the MXLIMS work which also has top priority and which also needs presenting and discussing at MAX IV - but I shall try to get as far as I can with a proposal in advance of the meeting. Ideally we would have the draft all ready so we could start applying it, but that may well be too ambitious

marcus-oscarsson commented 1 month ago

The MXCuBE meeting and the code camp is approaching and its soon time to have a some thoughts about and discuss this (XML to YAML configuration). Both @rhfogh and myself @marcus-oscarsson have outlined some ideas above and perhaps one of the more difficult issues would be how to gradually go from XML to YAML with as little issues as possible.

One idea expressed above is to introduce a config object that is a Pydantic model. The source of the data can be either XML or YAML. This object and its utility functions would have the same API for both formats and would live along side with the old configuration system (until replaced).

This at least implies (perhaps I'm omitting something):

A possible way to approach this would be to:

What do you think about the above, could it work, do you have other ideas ?

rhfogh commented 1 month ago

Time is getting rather short. So this is what I would try to do before Lund:

We can then later look at adding in Pydantic, getting rid of unsupported and deprecated functions, automatically writing out config files that match the new way we want these to look, and putting them in instead of the preceding set