usnistgov / OSCAL

Open Security Controls Assessment Language (OSCAL)
https://pages.nist.gov/OSCAL/
Other
671 stars 182 forks source link

Modify validation for Phone Numbers #2041

Open Telos-sa opened 1 month ago

Telos-sa commented 1 month ago

User Story

Currently the listed type for metadata>parties>telephone-numbers>number is a string, which has the regular expression pattern "\S(.*\S)?":

Screenshot 2024-09-13 at 5 29 58 PM

When validating using the oscal-cli, this field instead uses the regex pattern "^[0-9]{3}[0-9]{1,12}$" for validation, and throws warnings for varying (fake) phone numbers:

Screenshot 2024-09-13 at 5 36 51 PM

Goals

Validate using the string regex pattern which is outlined in the schema Or Create a new datatype for telephone-numbers>number that lists the regex pattern that is actually being used for validation

Screenshot 2024-09-13 at 5 44 06 PM

Dependencies

No response

Acceptance Criteria

(For reviewers: The wiki has guidance on code review and overall issue review for completeness.)

Revisions

No response

iMichaela commented 1 month ago

@Telos-sa - can you please provide the version of the oscal-cli you are using. I am assuming it is a FedRAMP constraint. While a better implementation is desired, we would need to see if other international numbers are supported.

iMichaela commented 1 month ago

How will extensions be noted when the regex pattern "^[0-9]{3}[0-9]{1,12}$" is used? Also numbers with prefixes marked with () ?

Telos-sa commented 1 month ago

@iMichaela This is version 2.0.2 of the base OSCAL validator (no FedRAMP constraints were used). I downloaded the pre-built version from https://repo1.maven.org/maven2/dev/metaschema/oscal/oscal-cli-enhanced/2.0.2/ as outlined on https://github.com/metaschema-framework/oscal-cli. If this is an incorrect oscal-cli please let us know where we can go to install the correct.

iMichaela commented 1 month ago

NIST oscal-cli latest release is v1.0.3. The version you point out to was generated not by NIST so we have no control over the OSCAL metaschema definitions (models) used to generate that version. If this is the one provided by FedRAMP and generated under metaschema-framework repo, please raise those concerns with the developers. There is no NIST oscal-cli v2.0.2. This is the version FedRAMP demo-ed last week. Please raise those issues with FedRAMP because the OSCAL version (when constraints are not applied) should be the official OSCAL release v1.1.2 or earlier. Can you please provide the version of OSCAL displayed by the tool?

https://github.com/metaschema-framework/oscal-cli is not a NIST repo. Here is NIST repo: https://github.com/usnistgov/oscal-cli

Thank you.

Telos-sa commented 1 month ago

Apologies for the version confusion, I have installed v1.0.3 from https://github.com/usnistgov/oscal-cli:

Screenshot 2024-09-16 at 3 38 05 PM

The same warnings are still present in v1.0.3

Screenshot 2024-09-16 at 3 24 34 PM

It makes sense that telephone-numbers>number would have a different validation regex pattern than strings, but that is not outlined in the v1.1.2 OSCAL metaschema.

telephone-numbers section under parties:

"telephone-numbers":{
    "type":"array",
    "minItems":1,
    "items":{
        "$ref":"#field_oscal-metadata_telephone-number"
    }
}

Assembly for field_oscal-metadata_telephone-number:

"oscal-ssp-oscal-metadata:telephone-number":{
    "title":"Telephone Number",
    "description":"A telephone service number as defined by ITU-T E.164.",
    "$id":"#field_oscal-metadata_telephone-number",
    "type":"object",
    "properties":{
        "type":{
            "title":"type flag",
            "description":"Indicates the type of phone number.",
            "anyOf":[
                {
                    "$ref":"#/definitions/StringDatatype"
                },
                {
                    "enum":[
                        "home",
                        "office",
                        "mobile"
                    ]
                }
            ]
        },
        "number":{
            "$ref":"#/definitions/StringDatatype"
        }
    },
    "required":[
        "number"
    ],
    "additionalProperties":false
}

StringDataType definition:

"StringDatatype":{
    "description":"A non-empty string with leading and trailing whitespace disallowed. Whitespace is: U+9, U+10, U+32 or [ \n\t]+",
    "type":"string",
    "pattern":"^\\S(.*\\S)?$"
}

The regex pattern "^[0-9]{3}[0-9]{1,12}$" is used by the oscal-cli for validation, but is not outlined anywhere in the metaschema.

wendellpiez commented 1 month ago

Hi, - note, this is corrected from earlier. (I checked again!)

The constraint being discussed is presumably this "WARNING"-level constraint at https://github.com/usnistgov/OSCAL/blob/4f02dac6f698efda387cc5f55bc99581eaf494b6/src/metaschema/oscal_metadata_metaschema.xml#L1052

This is non-operative in the JSON schema (or XSD) but will throw an error in a processor implementing the constraint as expressed. Presumably it's a WARNING because it was understood to be too broad.

We could make the warning go away by removing or relaxing the constraint.

iMichaela commented 1 month ago

Since many numbers can be noted with extensions that need to be documented with the phone number, or other, a with a + sign in front for international numbers , or have area code in parentheses, the warning (to me) is meant to raise human's attention to ensure the inserted number is correct, when non-numeric values are used. Tere is also the SIP protocol (https://en.wikipedia.org/wiki/SIP_URI_scheme) with valid format like sip:1-999-123-4567@voip-provider.example.net Is the suggestion here to remove the regex="^[0-9]{3}[0-9]{1,12}$" constraint all together and not use it as a semi-validation mechanism for numeric-only phone numbers?

GaryGapinski commented 1 month ago

Ah, "telephone numbers". They can be and are anything you want them to be. The word numbers is inadequate. Perhaps even telephone.

TL;DR: The constraint mentioned — \S(.*\S)? — is sufficient to encompass almost all telephone numbers of a length greater than 0 and is admirably catholic. It cannot express an absent value (the initial \S demands an initial non-space character). However, this constraint does not appear in the most recent (v1.1.1) XML and JSON Schemas. The schema documentation mentions that telephone-number is "A telephone service number as defined by ITU-T E.164.".

Without locale-specific presentation/formatting, even numeric telephone "numbers" look bad if subjected to artifical and unwarranted constraints.

Some references:

Examples (mostly not E.123) (I frequently find demarcation of treasured "number" syllables via periods, hyphens, spaces, parentheses):

¹ Fenimore Cooper's Literary Offences #7

wendellpiez commented 1 month ago

Excellent point about validation against \S(.*\S)? - while noting that metaschema-xslt (recently updated) has gotten some attention to datatypes - so cross-checking against its schemas (distinct from oscal-cli schemas or behavior) is also always warranted.

To the end of stabilizing testing and making cross-checks easier, if anyone is interested in schema field testing (either in theory or in practice), let me know or take a look at https://github.com/usnistgov/oscal-xproc3/tree/main/schema-field-tests

Fantastic info @GaryGapinski - thanks for the links.

wendellpiez commented 1 month ago

Noting in passing that the current metaschema-xslt also has a prototype implementation of constraints checking under XSLT. It is 95% feature-complete and testable, regex-matching being one of the things it is supposed to be able to do. While it is untested outside the lab, trying OSCAL metaschema-based validation could be a solid next step.

iMichaela commented 1 month ago

Noting in passing that the current metaschema-xslt also has a prototype implementation of constraints checking under XSLT. It is 95% feature-complete and testable, regex-matching being one of the things it is supposed to be able to do. While it is untested outside the lab, trying OSCAL metaschema-based validation could be a solid next step.

@wendellpiez - can you please elaborate how are the warnings from oscal-cli handled in the metaschema-xslt ? In which branch do you have these latest updates?

wendellpiez commented 1 month ago

@iMichaela recently merged into the main branch of the in metaschema-xslt repository (called develop) is an experimental application to generate XSLTs that perform validation of metaschema rules including the constraints.

The "Inspector XSLT" generator works the same way as the schema generators do - you call it with your metaschema (from a command line or under CI/CD), and it generates an output, in this case not a schema but an XSLT 'schema emulator' that you can use to validate your OSCAL XML.

I'm happy to help anyone interested in testing. So far no one (tmk) has tried it except myself and one volunteer assistant, and on a testing metaschema, not OSCAL.

However, I would also caution that I built this originally as a proof of concept, and subsequently to work up my skills in unit testing - but it has no maintenance model and no 'team' to help (no users, only one dev). This makes me hesitant to offer it as a solution (even if it could be a 'next step' as I mentioned).

Indeed, another much simpler idea along the same lines is simply to cross-test the rules using Schematron - here is an example (testing port-range rules (an XSpec file for unit testing the Schematron implementation): https://github.com/wendellpiez/oscal-xproc3/blob/ssp-portrange-constraints-May2024/schema-field-tests/reference-sets/ssp-model/port-ranges-test.xspec - this is a piece of cake :cake: if you can work an XSpec.