Rework sidecar files to support testing rule profiles

iay commented 10 months ago

The initial design of the sidecar .yaml files was intended to be as flexible as possible; in particular, it allows for sending each test case (.xml file) to multiple validator pipelines, with the single validator named default as a default.

On reflection and after some discussion, I'd like to propose changing this to better support testing what I think I'd like to call profiles represented as validator pipelines. In production, for example, we apply one set of rules to our own inventory and another set of rules to entities imported from eduGAIN.

One aspect of this that we don't have a way of representing in the current testbed implementation is that we may have a rule for which the same test case should have different results in different profiles. For example, we impose certain additional constraints on UKf-registered entities that we don't impose on eduGAIN-sourced metadata.

To handle this, I think it would make sense to replace this schema:

expected:
   # what is expected for all validators
validators:
  - validator-1
  - validator-2

... with something like this:

expected:
   # what is expected for all validators
validators:
  - name: validator-1
    expected:
      # this overrides the more global value
  - name: validator-2
    # "expected" defaults from the more global value

I think we should also give some thought to what omitting validators means. At present, it means "run the single validator called default". It might make more sense to make it mean "run all the configured validators".

There might be a case for adding something under validators to permit skipping a particular validator if it's inapplicable:

validators:
  - name: strange-validator
    skip: true

All of the above details are "straw man" rather than final proposals.

iay commented 10 months ago

One reason it would be relatively easy to change this is that (if I recall correctly) we don't currently use the ability to select validators in our test cases... as we only have one validator, the default one. So changing semantics is best done now, before we have any usage of the old construct.

philsmart commented 10 months ago

Yeah, I do not think any of the new tests explicitly define the validators to use (as you say, just default atm).

Perhaps validators should be mandatory, and you always have to specify which to use, so you do not need to define which to skip—which would require you to know which were inappropriate in advance (or via trial and error) rather than knowing which validators you want it to work over when you create the test.

alexstuart commented 10 months ago

In production, for example, we apply one set of rules to our own inventory and another set of rules to entities imported from eduGAIN.

We do, although there is a significant overlap between those two. As Phil and I are finding out in https://github.com/ukf/ukf-testbed/issues/19, checks are applied in a couple of places:

on import of entities from a particular channel (somewhere in bean="uk_registeredEntities" and bean="uk_int_edugain_importPipeline" respectively)
on output to the production pipeline we have checkPublishable, consisting of beans in a composite stage. I don't think all output pipelines will need this because some of the more application-focussed output aggregates are not guaranteed to be schema-valid.

When we talk about set of rules, are we talking about those as we import metadata from each source, or the totality of the checks from import to publish?

iay commented 5 months ago

The other source of variation (other than profiles) will be when we deliberately move forward and change the resulting errors. For example, when we replace the Xalan extension for URL validation with validators from MDA 0.10, both the component IDs and the text of messages will change. It's worth thinking about how we want to handle a transition like that.

One option that comes to mind is to allow regular expression matching in these locations, although how that would be controlled isn't obvious to me (you don't want it enabled on everything).

iay commented 4 months ago

Now that #78 has added option overrides as a general concept, it seems like it will be easy to implement the functionality we're looking for here by presenting it as a second override matcher:

expected:
   # what is expected for all validators
override:
  - validator: validator-1
    expected:
      # this overrides the more global value

Whether we need to be able to match both validator and endpoint in a single override needs some thought, but it should be relatively simple either way, it's still a question of iterating through the overrides and stopping when you find one which both matches and contains the option you're looking for.

I don't plan to implement this until we have a practical use case, though, in case it turns out not to be needed or there's another simplification that comes out of further thought.

ukf / ukf-testbed

Rework sidecar files to support testing rule profiles #18