The configuredExpressions variable has wrong values quantity after limitConfiguredExpressionsBySeverity is applied

See: org.sitenv.vocabularies.validation.services.VocabularyValidationService#validate

The configuredExpressions variable has wrong values quantity in case we want to validate same document more than two times with the different severity level. As a result we can get different validation results. Validation iterations: 1) Use INFO level - configuredExpressions list has 180 elements. 2) Use ERROR level - configuredExpressions list has 120 elements. 3) Use INFO level again - configuredExpressions list has 120 elements. Not 180 as it was before ERROR. In this case we can add a new list as a local variable and add suitable ConfiguredExpressions to it for each validation process.

Possible fix:

private void validate(Map<String, ArrayList<VocabularyValidationResult>> vocabularyValidationResultMap, String configuredXpathExpression, XPath xpath, Document doc, SeverityLevel severityLevel) throws XPathExpressionException {
        List<ConfiguredExpression> limitedConfiguredExpressions = new ArrayList<>();
        if (severityLevel != SeverityLevel.INFO) {
            limitConfiguredExpressionsBySeverity(severityLevel, limitedConfiguredExpressions);
        } else {
            limitedConfiguredExpressions = vocabularyValidationConfigurations;
        }

        globalCodeValidatorResults.setVocabularyValidationConfigurationsCount(limitedConfiguredExpressions.size());
        globalCodeValidatorResults.setVocabularyValidationConfigurationsErrorCount(determineConfigurationsErrorCount());

        for (ConfiguredExpression configuredExpression : limitedConfiguredExpressions) {

private void limitConfiguredExpressionsBySeverity(SeverityLevel severityLevelLimit, List<ConfiguredExpression> configuredExpressions) {
        // This improves performance since it is run before the expressions are processed
        logger.info("limiting configured expressions by severity level: " + severityLevelLimit.name());
        for (ConfiguredExpression configuredExpression : vocabularyValidationConfigurations) {
            // NodeCodeSystemMatchesConfiguredCodeSystemValidator defaults to ERROR severity dynamically
            configuredExpression.getConfiguredValidators().parallelStream()
                    .filter(configuredValidator -> !configuredValidator.getName().equalsIgnoreCase("NodeCodeSystemMatchesConfiguredCodeSystemValidator"))
                    .map(configuredValidator -> configuredValidator.getConfiguredValidationResultSeverityLevel().getSeverityLevelConversion())
                    .forEach(configuredSeverityLevelConversion -> {
                        if (severityLevelLimit == SeverityLevel.WARNING && configuredSeverityLevelConversion != SeverityLevel.INFO) {
                            // skip may/info configurations so we only process warnings and errors
                            configuredExpressions.add(configuredExpression);
                        }
                        if (severityLevelLimit == SeverityLevel.ERROR && configuredSeverityLevelConversion == SeverityLevel.ERROR) {
                            // skip may/info and should/warnings configurations so we only process errors
                            configuredExpressions.add(configuredExpression);
                        }
            });
        }
        configuredExpressions.removeIf(configuredExpression -> configuredExpression.getConfiguredValidators().isEmpty());
    }

Thanks, interesting. Hopefully, we can look into this in the next release cycle.

If I remember correctly, normally the configured expressions are only loaded once. But, there is a system in place to load them dynamically. Maybe, in order for this to work correctly (multiple sequential validations of the same file using different severity levels and tracking the proper list size), we should be kicking off the dynamic load for every validation regardless of if it has changed or not. Just a different idea off the top of my head without looking into this yet in any way (so may not apply at all).

In the meantime, did you want to maybe make a PR of your suggestion above, to the develop branch? Then we can work from that as a baseline (or potentially just merge it as is if all is agreed upon). If not, we can work from this, but that might be more clear.

I am going to check validation results without dynamic load and compare with site.healthit.gov.

As for the PR, yes I will create one or two with different approaches: 1) like in my first message; 2) create map with three lists at the application initialisation process, like we have now for the vocabularyValidationConfigurations bean.

If it changes to 120 after using error mode, and stays there from then on, do you think that it's only checking errors from then on, regardless of severity selection?

To answer my own question, I tested this. Even though vocabularyValidationConfigurationsCount is returned as 120, it still checks for all severity types, if info is selected.

I see you've added a PR for a new approach. Which is awesome, thanks! I will review and look into merging. But, what would you say the root of the bug was, in the most direct way? For example, if using the old approach, what change would fix it? Or, is the new approach fully necessary? Maybe we could chat about this sometime. Let me know when you are free. Thanks.

One day we found that two equal instances give us different results for equal requests after some time. This issue was investigated and I found that the app filters the configuration XML file each time you send a non-info severity level. But if we have 180 configs total we will get only 160 after we send "warning" and 120 after the "error" severity level and this number never will never be updated later. In fact, we have fewer configs in the vocabularyValidationConfigurations list for this file if we repeat the request or for the next files. So the fix I provided filters configs on three lists on the application startup. Also, we modified the validator for our internal needs and we added vocabularyValidationConfigurations number to the validator response so we can check how many configurations were involved in the validation process.

Tests: Full latest vocabularies are leaded.

Requests:

1. Request:

?validationObjective=170.315_b1_ToC_Amb&referenceFileName=170.315_b1_toc_amb_ccd_r21_sample2_v11.xml&severityLevel=info&curesUpdate=true&vocabularyConfig=ccdaReferenceValidatorConfig

Response:

....
"resultMetaData": [
            {
                "type": "C-CDA MDHT Conformance Error",
                "count": 0,
                "severity": "ERROR"
            },
            {
                "type": "C-CDA MDHT Conformance Warning",
                "count": 45,
                "severity": "WARNING"
            },
            {
                "type": "C-CDA MDHT Conformance Info",
                "count": 129,
                "severity": "INFO"
            },
            {
                "type": "ONC 2015 S&CC Vocabulary Validation Conformance Error",
                "count": 0,
                "severity": "ERROR"
            },
            {
                "type": "ONC 2015 S&CC Vocabulary Validation Conformance Warning",
                "count": 1,
                "severity": "WARNING"
            },
            {
                "type": "ONC 2015 S&CC Vocabulary Validation Conformance Info",
                "count": 9,
                "severity": "INFO"
            },
            {
                "type": "ONC 2015 S&CC Reference C-CDA Validation Error",
                "count": 0,
                "severity": "ERROR"
            },
            {
                "type": "ONC 2015 S&CC Reference C-CDA Validation Warning",
                "count": 0,
                "severity": "WARNING"
            },
            {
                "type": "ONC 2015 S&CC Reference C-CDA Validation Info",
                "count": 0,
                "severity": "INFO"
            }
        ],
        "vocabularyValidationConfigurationsCount": 180,
        "vocabularyValidationConfigurationsErrorCount": 121,
...

Request: equal to the first one. Response: equal to the first one.
Request: severity level is set to warning

Response:

        "resultMetaData": [
            {
                "type": "C-CDA MDHT Conformance Error",
                "count": 0,
                "severity": "ERROR"
            },
            {
                "type": "C-CDA MDHT Conformance Warning",
                "count": 45,
                "severity": "WARNING"
            },
            {
                "type": "C-CDA MDHT Conformance Info",
                "count": 0,
                "severity": "INFO"
            },
            {
                "type": "ONC 2015 S&CC Vocabulary Validation Conformance Error",
                "count": 0,
                "severity": "ERROR"
            },
            {
                "type": "ONC 2015 S&CC Vocabulary Validation Conformance Warning",
                "count": 1,
                "severity": "WARNING"
            },
            {
                "type": "ONC 2015 S&CC Vocabulary Validation Conformance Info",
                "count": 0,
                "severity": "INFO"
            },
            {
                "type": "ONC 2015 S&CC Reference C-CDA Validation Error",
                "count": 0,
                "severity": "ERROR"
            },
            {
                "type": "ONC 2015 S&CC Reference C-CDA Validation Warning",
                "count": 0,
                "severity": "WARNING"
            },
            {
                "type": "ONC 2015 S&CC Reference C-CDA Validation Info",
                "count": 0,
                "severity": "INFO"
            }
        ],
        "vocabularyValidationConfigurationsCount": 178,
        "vocabularyValidationConfigurationsErrorCount": 121,

Request: severity level is set to error

Response:

        "resultMetaData": [
            {
                "type": "C-CDA MDHT Conformance Error",
                "count": 0,
                "severity": "ERROR"
            },
            {
                "type": "C-CDA MDHT Conformance Warning",
                "count": 0,
                "severity": "WARNING"
            },
            {
                "type": "C-CDA MDHT Conformance Info",
                "count": 0,
                "severity": "INFO"
            },
            {
                "type": "ONC 2015 S&CC Vocabulary Validation Conformance Error",
                "count": 0,
                "severity": "ERROR"
            },
            {
                "type": "ONC 2015 S&CC Vocabulary Validation Conformance Warning",
                "count": 0,
                "severity": "WARNING"
            },
            {
                "type": "ONC 2015 S&CC Vocabulary Validation Conformance Info",
                "count": 0,
                "severity": "INFO"
            },
            {
                "type": "ONC 2015 S&CC Reference C-CDA Validation Error",
                "count": 0,
                "severity": "ERROR"
            },
            {
                "type": "ONC 2015 S&CC Reference C-CDA Validation Warning",
                "count": 0,
                "severity": "WARNING"
            },
            {
                "type": "ONC 2015 S&CC Reference C-CDA Validation Info",
                "count": 0,
                "severity": "INFO"
            }
        ],
        "vocabularyValidationConfigurationsCount": 120,
        "vocabularyValidationConfigurationsErrorCount": 121,

Request: like the very first. Sev level = info

    "resultMetaData": [
        {
            "type": "C-CDA MDHT Conformance Error",
            "count": 0,
            "severity": "ERROR"
        },
        {
            "type": "C-CDA MDHT Conformance Warning",
            "count": 45,
            "severity": "WARNING"
        },
        {
            "type": "C-CDA MDHT Conformance Info",
            "count": 129,
            "severity": "INFO"
        },
        {
            "type": "ONC 2015 S&CC Vocabulary Validation Conformance Error",
            "count": 0,
            "severity": "ERROR"
        },
        {
            "type": "ONC 2015 S&CC Vocabulary Validation Conformance Warning",
            "count": 0,
            "severity": "WARNING"
        },
        {
            "type": "ONC 2015 S&CC Vocabulary Validation Conformance Info",
            "count": 5,
            "severity": "INFO"
        },
        {
            "type": "ONC 2015 S&CC Reference C-CDA Validation Error",
            "count": 0,
            "severity": "ERROR"
        },
        {
            "type": "ONC 2015 S&CC Reference C-CDA Validation Warning",
            "count": 0,
            "severity": "WARNING"
        },
        {
            "type": "ONC 2015 S&CC Reference C-CDA Validation Info",
            "count": 0,
            "severity": "INFO"
        }
    ],
    "vocabularyValidationConfigurationsCount": 120,
    "vocabularyValidationConfigurationsErrorCount": 121,

As you can see:

The first request gave us 1 warning and 9 infos for Vocabulary Validation Conformance part.

{
                "type": "ONC 2015 S&CC Vocabulary Validation Conformance Warning",
                "count": 1,
                "severity": "WARNING"
            },
            {
                "type": "ONC 2015 S&CC Vocabulary Validation Conformance Info",
                "count": 9,
                "severity": "INFO"
            },

Equal request number 5 after severity level modifications gave us zero warnings and 5 infos.

 {
                "type": "ONC 2015 S&CC Vocabulary Validation Conformance Warning",
                "count": 0,
                "severity": "WARNING"
            },
            {
                "type": "ONC 2015 S&CC Vocabulary Validation Conformance Info",
                "count": 5,
                "severity": "INFO"
            },

Thanks again for your work on the PR.

I have updated where necessary to merge this code into the code validator and to make it work with the reference ccda validator, fixing minor conflicts along the way. I wonder, did you already do this? Or, do you not use the reference ccda validator? I did have to make some minor updates to the code validator as well to handle some issues with reference tests. Such as handling an NPE for configuredExpressionsBySeverityLevel. This was likely only isolated to the testing and why you may not have noticed.

There is one item which is not working as expected though (as it was before). And this is keeping me from merging your work into develop. Maybe you encountered this and have thoughts on how to resolve? Everything is on the branch fixSeverityLevel (branch on reference-ccda-validator and branch on code-validator-api). Of course, if you don't have time, that is of course fine, I will update as needed. I'm not sure if you're looking to use the new project as a baseline, or if you'd be interested in the issue I found.

Basically, if we load up RefCCDATest.java in the Ref Val project, and find the vocabAndMdhtSeverityLevelErrorMultipleValidatorsPerExpressionTest test. We will note that that test fails but used to pass. The reason appears to be related to how we are handling multiple validators with one expression. I created some new tests, which work (e.g. vocabAndMdhtSeverityLevelErrorSingleValidatorsPerExpressionTest), to compare, which use a file where the validators are separated, per expression to isolate the discrepancy. It gets confusing because there was a known bug prior with this type of arrangement (multiple validators per expression) but that only ever applied to warnings and info prior, never errors. And, since our config wasn't using this setup for anything other than SHALL, it never mattered. But after merging the new code from your PR, it will affect one of our actual arrangements in ccdaReferenceValidatorConfig. As the bug now exists for SHALL too, seemingly related to this mapped severity update, because the test didn't fail prior. You can see that arrangement, here:

    <!-- SOCIAL HISTORY OBSERVATION (V3) START -->
    <!--8. SHOULD contain zero or one [0..1] value (CONF:1198-8559). a. If Observation/value is a physical quantity (xsi:type="PQ"), the unit of measure SHALL be selected from ValueSet UnitsOfMeasureCaseSensitive (2.16.840.1.113883.1.11.12839) DYNAMIC (CONF:1198-8555).-->
    <expression xpathExpression="//v3:observation/v3:templateId[@root='2.16.840.1.113883.10.20.22.4.38']/ancestor::v3:observation[1]/v3:value[@unit and not(@nullFlavor) and ancestor::v3:section[not(@nullFlavor)]]">
        <validator>
            <name>UnitAllowsOneValidator</name>
            <validationResultSeverityLevels>
                <codeSeverityLevel>SHALL</codeSeverityLevel>
            </validationResultSeverityLevels>
            <allowedValuesetOids>2.16.840.1.113883.1.11.12839</allowedValuesetOids>
        </validator>
    </expression>
    <expression xpathExpression="//v3:observation/v3:templateId[@root='2.16.840.1.113883.10.20.22.4.38' and @extension='2015-08-01']/ancestor::v3:observation[1]/v3:value[@xsi:type='PQ' and not(@nullFlavor) and ancestor::v3:section[not(@nullFlavor)]]">
        <validator>
            <name>RequiredNodeValidator</name>
            <validationResultSeverityLevels>
                <codeSeverityLevel>SHALL</codeSeverityLevel>
            </validationResultSeverityLevels>
            <!-- Provide an @attributeName or elementName e.g. enter '@unit' for an attribute or 'v3:observation' for an element -->
            <requiredNodeName>@unit</requiredNodeName>
            <!-- Provide a unique validation message for the situation. A simple version is generated from the requiredNodeName and the two are merged -->
            <validationMessage>If Observation/value is a physical quantity (xsi:type="PQ"), the unit of measure SHALL be selected from ValueSet UnitsOfMeasureCaseSensitive (2.16.840.1.113883.1.11.12839) DYNAMIC (CONF:1198-8555).</validationMessage>
        </validator>
    </expression>
    <!-- SOCIAL HISTORY OBSERVATION (V3) END -->

This is what has changed since the old implementation as far as tests passing (after accounting for updating tests to accommodate the new map style):

*Fails with new impl, passes with old: vocabAndMdhtSeverityLevelInfoMultipleValidatorsPerExpressionTest vocabAndMdhtSeverityLevelWarningMultipleValidatorsPerExpressionTest

Passes still (as it did before): vocabAndMdhtSeverityLevelInfoMultipleValidatorsPerExpressionTest

Passes with old and new impl (these are new tests): vocabAndMdhtSeverityLevelErrorSingleValidatorsPerExpressionTest vocabAndMdhtSeverityLevelWarningSingleValidatorsPerExpressionTest vocabAndMdhtSeverityLevelInfoSingleValidatorsPerExpressionTest

Again, I want to stress, this is not your job, and I appreciate what you have provided. I am only giving an update. And if interested, check out the fixSeverityLimiting branch on code-validator-api and reference-ccda-validator (the content validator can be develop, master, or a pre-built jar).

On another, quite relevant note, interestingly enough, the bug you found where configs are missing over time, does not exist on our old impl on our dev server. This is because the dev server reloads the config every time because it is configured to use a dynamic config. Like so: ccda dev referenceccdaservice.xml

    <Parameter name="referenceccda.isDynamicVocab" value="true" override="true"/>

Where as if it were set to:

    <Parameter name="referenceccda.isDynamicVocab" value="false" override="true"/>

Then it encounters the bug, because it is static, and only loaded once. This is why the bug had escaped us until now, as, we don't test with isDynamicVocab set to false (we don't test with static configs).

So, one simple fix to the entire issue without any new code would be to just use the dynamic config (setting isDynamicVocab to true in the referenceccdaservice.xml). And, this fix has been applied to production. So, we no longer experience the bug there. However, we should support a static config as well, which is what your fix allows for. So I will continue to work on that branch in isolating the ref val test issue when time.

onc-healthit / code-validator-api

The configuredExpressions variable has wrong values quantity after limitConfiguredExpressionsBySeverity is applied #107