mrmap-community / mrmap

Spatial Service Registry
https://mrmap.rtfd.io/en/master/
MIT License
10 stars 6 forks source link

Conformity checks #36

Closed jokiefer closed 3 years ago

jokiefer commented 3 years ago

Requirements - outdated!

  1. New app must be created: quality
  2. New models must be created
    1. Rule
      1. name - CharField
      2. field_name - CharField (with choices)
      3. property - CharField (with choices)
      4. operator - CharField (with choices)
      5. threshold - IntegerField
    2. RuleSet
      1. name - CharField
      2. rules - M2MField (on Rule)
    3. ConformityCheckConfiguration
      1. name - CharField
    4. ConformityCheckConfigurationExternal inherits from ConformityCheckConfiguration
      1. external_url - URLField
      2. parameter_map - TextField (contains json)
      3. response_map - TextField (contains json)
    5. ConformityCheckConfigurationInternal inherits from ConformityCheckConfiguration
      1. mandatoryrulesets - M2MField (on RuleSet)
      2. optionalrulesets - M2MField (on RuleSet)
    6. ConformityCheckRun
      1. metadata - ForeignKey (on Metadata)
      2. confirmity_check_configuration - ForeignKey (on ConformityCheckConfiguration)
      3. time_start - DateTimeField
      4. time_stop - DateTimeField
      5. errors - TextField (contains json)
      6. additional_info - TextField (contains json)
  3. All models must be configurable in the django admin interface

Requirements reduced 2020-10-15

  1. New app must be created: quality

  2. New models must be created

    1. ConformityCheckConfiguration
      1. name - CharField
      2. metadata_types - TextField (contains json) - metadata.metadata_type, view format
    2. ConformityCheckConfigurationExternal inherits from ConformityCheckConfiguration
      1. external_url - URLField - API
      2. api_configuration - TextField (contains json) - defines test classes, result parsing, ...
    3. ConformityCheckRun - async - see pending tasks - type validate
      1. metadata - ForeignKey (on Metadata)
      2. conformity_check_configuration - ForeignKey (on ConformityCheckConfiguration)
      3. time_start - DateTimeField
      4. time_stop - DateTimeField
      5. errors - TextField (contains json)
      6. passed - Boolean
      7. result - TextField (contains json, xml, html)

Conformity checks

We need to separate between internal and external confirmity checks. Please regard the attached diagram to get an idea of the structure.

ConformityCheckConfiguration

Base model for ConformityCheckConfigurationExternal and ...Internal.

name

The name of the configuration

ConformityCheckConfigurationInternal

Holds the configuration for an internal conformity check.

mandatory_rule_sets

A set of ruleSet records, which are mandatory to pass successfully.

optional_rule_sets

A set of ruleSet records, which are nice if passed successfully.

ConformityCheckConfigurationExternal

Holds the configuration for an external conformity check.

external_url

A link pointing to an external test API.

parameter_map

Holds json as text. Further details below.

response_map

Holds json as text. Further details below.

ConformityCheckRun

Holds the relation of a metadata record to the results of a check

metadata

A ForeignKey to the related metadata

conformity_check_configuration

A ForeignKey to a ConformityCheckConfiguration record.

time_start

DateTime when the test started

time_stop

DateTime when the test ended

errors

Holds json as text.

additional_info

Holds json as text.

ConformityCheckInternal

Internal checks are based on Rules and RuleSets.

RuleSet

Groups rules and holds the results of a rule check run. RuleSets iterate over their rules and perform the rule checks.

name

The name of a ruleSet.

rules

A set of rules.

Rule

name

The name of a rule.

field_name

field_name defines an attribute of the Metadata model, such as abstract or keywords. A list of valid choices must be provided, so the rules can be easily created.

Valid choices are

  1. title
  2. abstract
  3. access_constraints
  4. keywords
  5. formats
  6. reference_system

property

A property describes what can be measured on the field_name. E.g. abstract can have the property len, since it's a string. keywords is a queryset of records and can have the property count, which is semantically equal to len but reduces the query time.

For now there are only these two properties which can be checked. The list of choices can be extended in the future.

operator

Defines a mathematical operator, which is used to describe the expected field_name property.

Valid choices are:

  1. >
  2. >=
  3. <
  4. <=
  5. ==
  6. !=

How a rule could be used

A rule could be checked like this

def check_rule(metadata: Metadata, rule: Rule):
    prop = rule.property

    if prop == 'len':
        attribute = metadata.__getattribute__(role.field_name)
        real_value = len(attribute)
    elif prop == 'count':
        # count wird für M2M relations verwendet
        manager = metadata.__getattribute__(field)
    elements = manager.all()
    real_value = elements.count()
    else:
        raise Exception("No valid property")

    condition = str(real_value) + rule.operator + str(rule.threshold)
    return eval(condition, {'__builtins__': None})

rules = Rule.objects.all()
failed = []

for rule in rules:
    success = check_condition(metadata, rule)
    if not success:
        failed.append(rule)

ConformityCheckExternal

Mandatory attributes for this class are:

external_url

Points to an online API

parameter_map

Maps the metadata records field names on the required parameters for the API.

Example: We have a metadata object with a field named get_capabilities_uri, which has to be used as the parameter inputURI of the API:

{
  "inputURI": "get_capabilities_uri",
}

Example implementation for usage

json_obj = json.loads("...")  # the json from above
params = {}

for key, val in json_obj.items():
    params[key] = metadata.__getattribute__(val)

request.post(api_uri, params)  # perform test request

response_map

Maps the known response elements to a given structure. This way, we are aware of how, e.g. the differen error elements of the response are named, so we can parse the response in a generic way.

Example:

{
  "error_identifier": [],
  "additional_info_identifier": [],
  "success_identifier": "ResultOfTest",
}

error_identifier

Every API must have at least one error element, which contains the found errors during the test. It may be, that there is not just one error element, but rather multiple error elements for different types of error (TypeError, OutOfRangeError, ...). All these error element identifiers have to be collected in a list, which we call error_identifier

additional_info_identifier

Some APIs might provide warnings or hints as well, which are not really errors but some kind of additional information. We proceed here the same as we did for error_identifiers: Put all these hints or warning element identifier in a list, which is called additional_info_identifier.

success_identifier

Every test API should have one element in response, which states whether the test run was successful or failed for the input. This element has to be named by using success_identifier

Example implementation for usage

json_obj = json.loads("...")  # the response_map from above
result = {
    "errors": [],
    "additional_information": [],
    "success": None,
}

error_identifiers = json_obj["error_identifier"]
additional_info_identifier = json_obj["additional_info_identifier"]
success_identifier = json_obj["success_identifier"]

for error_identifier in error_identifiers:
    result["errors"] = response.get_val_for_identifier(error_identifier)

# analogous for additional_information and success

Please note: The function get_val_for_identifier(id: str) does not exist yet and I didn't give any example of how to implement it. This function would be more complex, since it has to handle json, as well as xml responses, in case we deal with a test API, that uses xml.

jokiefer commented 3 years ago

UI

We propose to add a dropdown Menu/Button to the places, where a ConformityCheckRun should be triggered for a specific Metadata object. Such a dropdown lists all ConformityCheckConfigurations that are allowed for the given metadata_type.

Does that match your expectations?

response_map

As we store the complete response of a single run in the response field as JSON, I doubt there is the need for the errors and additional_information fields. We are able to generate an HTML representation of the response object that can look like this:

image

Is there any additional use for the fields error and additional_information, besides giving feedback to users?

If not, we propose following parameter_map object:

{
"status_identifier": "EtfItemCollection.testRuns.TestRun.status",
"error_values": ["FAILED"],
"success_values": ["PASSED", "WARNING"]
}

Here, success_identifier contains the path to the key that holds the information about the status of the CheckRun. The fields error_values and success_values hold the values of the success_identifier that determine a failed or a passed CheckRun, respectively.

Example:

A given response

{
  "EtfItemCollection": [
    {
      "testRuns": [
        {
          "TestRun": {
            "status": "PASSED"
          }
        }
      ]
    }
  ]
}

would be evalutated as successful test, whereas

{
  "EtfItemCollection": [
    {
      "testRuns": [
        {
          "TestRun": {
            "status": "FAILED"
          }
        }
      ]
    }
  ]
}

would be evaluated as failed test. In this scenario, a result with "status": "WARNING" would also be evaluated as successful test.

Please let us know, if the proposed changes match your expectations.

jokiefer commented 3 years ago

The proposed changes match our expectations ;-)

jokiefer commented 3 years ago

Do I assume correctly, that we want to validate the by MrMap generated metadata documents (e.g. http://127.0.0.1:8000/resource/metadata/wms-nw-bodenbewegungsgebiete-kacheln-service), and not the original metadata documents (e.g. https://apps.geoportal.nrw.de/soapServices/CSWStartup?Service=CSW...)?

Are the values (e.g. keywords) in the Metadata model parsed from the GetCapabilities documents or from the metadata documents?

Do we want to validate the GetCapabilities documents of the original service? Does MrMap generate GetCapabilities documents on its own?

jokiefer commented 3 years ago

hi jan,

yes we want to validate the metadata that is generated by mrmap - this may be

dataset metadata: https://mrmap.geospatial-interoperability-solutions.eu/resource/metadata/dataset/adressen-der-gewerbeaufsicht-dataset

wms capabilities: https://mrmap.geospatial-interoperability-solutions.eu/resource/metadata/soziale-einrichtungen-msagd-service-1/operation?request=GetCapabilities

wms service metadata: https://mrmap.geospatial-interoperability-solutions.eu/resource/metadata/soziale-einrichtungen-msagd-service-1

wfs capabilities: https://mrmap.geospatial-interoperability-solutions.eu/resource/metadata/soziale-einrichtungen-msagd-service/operation?request=GetCapabilities

wfs service metadata: https://mrmap.geospatial-interoperability-solutions.eu/resource/metadata/soziale-einrichtungen-msagd-service

... work in progress atom feed service metadata, ogc api features service metadata, ...

the values in the metadata model are editable by the administrator, the original documents are stored too and can be invoked and compared if we need them ;-)

as you see above, mrmap generates own capabilities - which use the internal (maybe edited) metadata. with this kind of workflow, we are able to proxy and qualify all distributed resources and we are able to add some relevant information (licences, service metadata linkages, additional metadataurls, identifiers, ...)

;-)

jokiefer commented 3 years ago

Which of the default groups should have the permission to run the validation?

jokiefer commented 3 years ago

I think we should use "Resource Administrator" initially - maybe somewhen later may give the permission also to authenticated users.

jokiefer commented 3 years ago

Just to clarify: With the current design, conformity checking will only work for active services. This is due to the fact that the ETF tests usually perform requests against the service endpoint. Also, downloading of the metadata records is currently only possible for active services.

jokiefer commented 3 years ago

@jansule this is merged in #42 I solved all merge conflicts and merge it, so i can move the repo.... The views will be refactored by my self. Can we close this issue now?

jansule commented 3 years ago

@jokiefer thanks. This can be closed then.

I added an issue regarding the duplicates in requirements.txt so we do not forget to address this issue https://github.com/mrmap-community/mrmap/issues/64.