mxcube / mxcubecore

Backend used by MXCuBE
http://mxcube.github.io/mxcube/
GNU Lesser General Public License v3.0
11 stars 51 forks source link

Code Camp May 2024: Convert XML configutation to YAML #931

Open rhfogh opened 1 month ago

rhfogh commented 1 month ago

For the May 2024 Code Camp at MAX IV, we are planning to start converting XML configuration files to YAML, with the refactoring needed.

Starting point

There is a starting branch xml_yaml_conversion for both mxcubecore, mxcubeqt, and mxcubeweb. This branch has a single HardwareObject Superclass that works with both XML and YAML configuration files, so you can convert the config files one by one without changing the class structure (NB You cannot have Yaml-configured HWOBJ contained as roles within XML-configured HWOBJ). The key changes are in BaseHardwareObjects and HardwareRepository. For an example of new yaml-configured objects, see GphlWorkflow and GphlWorflowConnection.

The Qt mock version loads correctly in the xml_yaml_conversion branch, and the GPhL workflow runs correctly with emulated data collection. The mxcubeweb version has been partially converted, but not tested, so there will likely be changes still to do - for instance someone should check beamline.hardwareObjects. The code generates draft .yml files for all configuration files, which can then serve as starting points for making the new configuration files:

mock_config_dirs_tmp.tar.gz

Work plan

marcus-oscarsson commented 1 month ago

Thanks @rhfogh for this excellent start of the code camp, really well done !

It would be great if now when @rhfogh have invested all this time to prepare for the code camp that we also do the same and at least:

beteva commented 1 month ago

Thanks @rhfogh for the work done. This gives us good basis for thinking.

In this context I have few questions/comments

  • The mxcubeweb version needs to be debugged to work with the current xml_yaml_conversion branch

You are talking about web and Qt - don't they work with the same mock classes? Would not it be better to make first the tests work with the mock classes being converted to use yaml configuration and than this can than server as a model.

  • We need to get rid of duplicate attachment points for HWOBJ so all HWOBJ are attached to the Beamline object (directly or indirectly) only once.

It is all right to remove duplications, but I do not think it is reasonable to have all HWOBJ attached to the Beamline object. or maybe I misunderstood this?

Question of personal taste, but I think the _ (e.g. _objects) makes the file less readable.

Is there a technical reason to name the files like beam.aperture.yml (or similar). Does this mean that the aperture has to be accessed via the beam HWOBJ? This leads to a bigger question - do we need, for the HWOBJ not being defined in the Beamline object, fix via which HWOBJ they are accessible. My opinion is NOT, as this could become very rigid configuration.

rhfogh commented 1 month ago

@beteva

If you do not like having to write 'HWR.beamline.config.diffractometer.config.phiy' in a lot of places, you can still do self.phiy = HWR.beamline.config.diffractometer.config.phiy in one place and use self.phiy in the rest of the class (though I would recommend not using the obj.phiy from outside the class, for clarity).

A final comment: The configurable data were all put into obj.config because I thought that best reflected the idea of having the configuration defined by pydantic, and not having the attributes hardwired inside the classes. if we prefer we could put the attributes directly inside the classes instead, it would not be that hard to change, but I do not know how we would use Pydantic in that case.

marcus-oscarsson commented 1 month ago

Accessing objects

The draft implementation made by @rhfogh attaches all the configuration data to a config object as we discussed in #897. The idea is also as @rhfogh mentioned above that this object is defined by a schema and we have for the moment decided that it would be handled by pydatanic.

Reading the code the current suggestion produces make me think that we might want to refine the way objects (roles) are handled. Its completely technically feasible to use the current proposal that results in for instance:

detector_distance = HWR.beamline.config.detector.config.distance
# or
phiy = HWR.beamline.config.diffractometer.config.phiy

That would be correct according to the idea but its, in my opinion, semantically a bit awkward and it would perhaps be more elegant/easier to be able to write (like today).

detector_distance = HWR.beamline.detector.distance
# or
phiy = HWR.beamline.diffractometer.phiy

We could imagine a system where all roles are defined on the "parent" object and cross checked with the pydatanic model. For instance one could imagine something like the code below, where detector_distance only gets assigned to SomeDetector if the attribute exists and an error if it does not. I would also suggest that we build in some logic that checks for cycles to avoid initialization problems. I suggest that all other configuration data should be accessed via the .config attribute.

from pydantic import BaseModel, Field

class DetectorConfigModel(BaseModel):
    name: str = Field("", description="Detector name")
    detector_distance: Role = Field("detector_distance", description="Detector name")

class SomeDetector(AbstractDetector):
    CONFIGURATION_SCHEMA = DetectorConfigModel

    def __init__(self, name):
       self.detector_distance: AbstractMotor = None
       self._config: Optional[DetectorConfigModel] = None

Configuration files It would probably be nice with some general configuration file name convention. Its not the most important but at the moment there is for instance config-file.xyz and config_file.xyz and the name of the file should perhaps be the same as the role name (or something like that).

Looking at the following example:

_initialise_class:
    class: DetectorMockup.DetectorMockup
_objects:
    detector_distance: detector.detector_distance.yml
beam:
    ax: 0.0
    ay: 0.0402
    bx: 1565.715
    by: 1702.058
fileSuffix: h5
hasShutterless: true
height: 3269
humidityThreshold: 20.0
manufacturer: DECTRIS
model: 9M
px: 0.075
py: 0.075
roiModes: ("0", "C18", "C12", "C2")
tempThreshold: 33.5
tolerance: 0.2
type: Eiger2
width: 3110

I also think that the keywords _initialise_class and _objects are difficult to understand and the _ feels a bit strange. Perhaps simply python_class or object and roles or objects would relate more directly to mxcube, for instance:

object: DetectorMockup.DetectorMockup
roles:
    detector_distance: detector.detector_distance.yml
beam:
    ax: 0.0
    ay: 0.0402
    bx: 1565.715
    by: 1702.058
fileSuffix: h5
...
rhfogh commented 1 month ago

@marcus-oscarsson This all makes sense.

Anyway, I agree that all those '.config.' are rather clumsy and better avoided - and one way or another roles would have to be treated as a special case compared to the other configuration attributes. If there is a decent Pydantic way of doing this, then, please, let us do it.

The one thing we need to figure out is how to make it clear that the python_class and roles are not transferred to the object namespace like the other attributes. They are strings in the file, but whatever we do in the object they will be something else. Maybe some less nice name like 'howobj_roles'?

fabcor-maxiv commented 1 month ago
initialise_class:
    class: DetectorMockup.DetectorMockup
objects:
    detector_distance: detector.detector_distance.yml
configuration:
    beam:
        ax: 0.0
        ay: 0.0402
        bx: 1565.715
        by: 1702.058
    fileSuffix: h5
    hasShutterless: true
    height: 3269
    humidityThreshold: 20.0
    manufacturer: DECTRIS
    model: 9M
    px: 0.075
    py: 0.075
    roiModes: ("0", "C18", "C12", "C2")
    tempThreshold: 33.5
    tolerance: 0.2
    type: Eiger2
    width: 3110
rhfogh commented 1 month ago

Looks nice. I guess it would mean that the file validation would not be the same as the Pydantic, but as long as that is not a problem ...

On Mon, 27 May 2024 at 11:02, fabcor @.***> wrote:

initialise_class: class: DetectorMockup.DetectorMockupobjects: detector_distance: detector.detector_distance.ymlconfiguration: beam: ax: 0.0 ay: 0.0402 bx: 1565.715 by: 1702.058 fileSuffix: h5 hasShutterless: true height: 3269 humidityThreshold: 20.0 manufacturer: DECTRIS model: 9M px: 0.075 py: 0.075 roiModes: ("0", "C18", "C12", "C2") tempThreshold: 33.5 tolerance: 0.2 type: Eiger2 width: 3110

— Reply to this email directly, view it on GitHub https://github.com/mxcube/mxcubecore/issues/931#issuecomment-2133233272, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACK5QEATPMPTKTRG4RRG4DLZEMHFBAVCNFSM6AAAAABIGIEGHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZTGIZTGMRXGI . You are receiving this because you were mentioned.Message ID: @.***>

fabcor-maxiv commented 1 month ago

Is there more to initialise_class? Can there be more than one initialisation class? Can it be simplified to initialisation_class: DetectorMockup.DetectorMockup?

rhfogh commented 1 month ago

One thing about the example configuration files that I probably did not make clear is that they were geerated automatically from the configuration that was read in from the old configuration files. So if you want to upgrade your local site-specific configuration files, you can get your own drraft files in the same way by setting a switch. The naming is just a byproduct - I did not have the original file name available at the point where the new files were being written out, so this was an easy way of doing it.

Anyway, it should be easy enough to change the internal structure of those draft files and also the attribute access inside the WIP proposal - as soon we know what would be the right way of doing it in Pydantic.

rhfogh commented 1 month ago

@fabcor-maxiv There can be only one initialisation class, and it must be a fully qualified module name when the code runs. So if HardwareObjects/mockups in on the PythonPath, then DetectorMockup.DetectorMockup would be OK - what is there is what came out of object.__module__.

What would you use multiple initialisation classes for?

There is one more facility - you can have a dictionary of keyword:value parameters that are passed to the class __init__ when the class is generated. I do not think it has ever been used (so I was considering droppng it), but the code is there.

marcus-oscarsson commented 1 month ago

I like the idea with a configuration section it makes the file more readable and probably more practical to handle. I'm not so much in favor of the name initialise_class, I'd prefer hardware_object, hwobject, or simply object.

fabcor-maxiv commented 1 month ago
class: DetectorMockup.DetectorMockup
objects:
    detector_distance: detector.detector_distance.yml
configuration:
    beam:
        ax: 0.0
        ay: 0.0402
        bx: 1565.715
        by: 1702.058
    fileSuffix: h5
    hasShutterless: true
    height: 3269
    humidityThreshold: 20.0
    manufacturer: DECTRIS
    model: 9M
    px: 0.075
    py: 0.075
    roiModes: ["0", "C18", "C12", "C2"]
    tempThreshold: 33.5
    tolerance: 0.2
    type: Eiger2
    width: 3110
rhfogh commented 1 month ago

As of 20240530 there is a new version that should fit with the first agreements in the code camp.

NBNB As of 1324 (latest fix, Swedish time) the code should be fixed, pushed (and tested with the Qt version). The web code not yet. More after lunch The new set of test config files is here: mock_config_dirs_tmp.tar.gz