teemtee / tmt

Test Management Tool
MIT License
84 stars 125 forks source link

Plugin: Cannot add required fields for `StepData` #2766

Open LecrisUT opened 7 months ago

LecrisUT commented 7 months ago

I was trying to do

@dataclasses.dataclass
class PrepareCMakeData(PrepareStepData):
    preset: str = field(
        option=("-p", "--preset"),
        metavar="PRESET",
        help="Configure preset",
    )

I would like this to be a single-value required field, but I cannot do that because it follows non-required fields of PrepareStepData

Approach 1

Add kw_only=True

@dataclasses.dataclass(kw_only=True)
class PrepareCMakeData(PrepareStepData):

This kinda works, but the error handling is all messed up:

$ tmt plans show

    warn: /plan:prepare - {'how': 'cmake'} is not valid under any of the given schemas
/plan

Failed to load step data for PrepareCMakeData: PrepareCMakeData.__init__() missing 1 required keyword-only argument: 'preset'

Potential solution to this is to try ... catch this source of error and re-format it appropriately

Approach 2

Add attrs style validator. The most intuitive for the user using it, but requires quite some implementation on the backend. This would still have such reporting errors, but this is easier to try ... catch and report the errors

Approach 3

Wrap the dataclass decorator with json schema validation. I think this is the preferred approach that is planned to be implemented?

One missing part here, is how can the plugin add paths for the schema to check? Ideally this is something to specify when subclassing a plugin or plugin data class

happz commented 7 months ago

Yeah, this is not supported by dataclasses at all, the order is important. I'm not sure whether any of the solutions would work because none addresses the order of fields or (the default) value of the field. The only advice I have now would be to make the field Optional and use None as the default value - I know, it's ugly, more assert calls to make type-checking happy...

The solution might be switching to attrs, where this should not be a problem, and that's my long-term plan. However, I did not start working on it, because I'm mostly happy with what we have now. Not fully, but it's Good Enough(TM). It works, brings enough safety and formalism, no more dictionaries, which means I was able to focus on other areas and put attrs to backburner. Your case is valid, unfortunate, yet a corner one & affected by the core limitation of dataclasses package :(

LecrisUT commented 7 months ago

I'm not sure whether any of the solutions would work because none addresses the order of fields or (the default) value of the field.

With either the attrs validator or json schema validations, it would work, but you would still have to specify it as Optional.

more assert calls to make type-checking happy...

Yeah, but this one would still break the error reporting. I am actually stuck at the next issue with the JSON schema validation:

One missing part here, is how can the plugin add paths for the schema to check? Ideally this is something to specify when subclassing a plugin or plugin data class

But the JSON schema validation is not performed for tmt run right? I am wondering how much I am blocked by this right now.


The solution might be switching to attrs

After you first introduced me to this, I can confirm attrs is the way to go. And actually I don't think it would be that difficult of a transition, since most of the API is compatible. We can at least open a PR to investigate gradual transition. Do you have a suggestion on the simplest dataclass to start with?

happz commented 7 months ago

I'm not sure whether any of the solutions would work because none addresses the order of fields or (the default) value of the field.

With either the attrs validator or json schema validations, it would work, but you would still have to specify it as Optional.

Yeah, but that's the difference: "with attrs..." - and the rest does not matter. It's not possible with stdlib dataclasses, i.e. kw_only, "attrs style validator" or wrapping dataclass with JSON validation won't let you not define a default value. Might be dummy one, None, and matching type, but as long as dataclasses control the class, field shall have a value :)

more assert calls to make type-checking happy...

Yeah, but this one would still break the error reporting.

Hmm, that would be unexpected. Once the class is successfully defined, existing code should be able to work with it easily, Optional or not.

I am actually stuck at the next issue with the JSON schema validation:

One missing part here, is how can the plugin add paths for the schema to check? Ideally this is something to specify when subclassing a plugin or plugin data class

Right, I wanted to address this one but missed it eventually. No, tmt does not have a way to include custom schema, but it should have, e.g. an environment variable similar to TMT_PLUGINS. Definitely worth a patch.

But the JSON schema validation is not performed for tmt run right? I am wondering how much I am blocked by this right now.

It is performed every time tmt loads fmf data. tmt asks fmf library to load data from files, then asks fmf library to run schema validation, then tmt turns the tree of fmf structures into tests and plans. The report from JSON validation is just a warning, I plan to improve its reporting - until then, it can be way too cryptic to be of any use as a mandatory test.

The solution might be switching to attrs

After you first introduced me to this, I can confirm attrs is the way to go. And actually I don't think it would be that difficult of a transition, since most of the API is compatible. We can at least open a PR to investigate gradual transition. Do you have a suggestion on the simplest dataclass to start with?

Hard to tell, maybe check tmt.hardware, those are pretty isolated, or tmt.queue, tmt.steps.provision.mrack. Basically, nothing that's derived from tmt.utils.DataContainer or uses field(), because there's a lot of code attached to that. Which is also something to think about, those containers need to be written to and read from files, plus we save plenty of field metadata (tmt.utils.FieldMetadata) because fields define keys, keys can be set by CLI options, have help strings, metavars, all the stuff that both CLI and Sphinx need to access, so some structured way is needed for storing this kind of info. Something will not be needed, maybe normalization would be possible to be performed by attrs, but something will remain with us. On the other hand, if switching to attrs, tackling field() & co. would be inevitable. I'd expect we would end up end up with two sets of code, the legacy one and the new one, and convert plugin by plugin, to make things reviewable.

LecrisUT commented 7 months ago

Hmm, that would be unexpected. Once the class is successfully defined, existing code should be able to work with it easily, Optional or not.

I think I am thinking of the schema validation error that I currently have. I will come back to this if the issue persists with using assert.

The report from JSON validation is just a warning

Ok, I can than move on and see what I can hack up further.

I'd expect we would end up end up with two sets of code, the legacy one and the new one, and convert plugin by plugin, to make things reviewable.

Yep, that's the plan I had in mind. First figure out if we will have any surprises with EPEL + Fedora rawhide compatibility :D