pwall567 / json-kotlin-schema

Kotlin implementation of JSON Schema (Draft 07)
MIT License
88 stars 12 forks source link

Validation of email address with non ASCII character fails #14

Open Alina-Valea-Forter opened 11 months ago

Alina-Valea-Forter commented 11 months ago

Hi,

The json that I am trying to validate has a non ASCII character in the email field e.g. the Spanish letter "ñ" and consequently validation fails with the following error:

A subschema had errors - #/email
Value fails format check "email", was "mu\u00F1eca@test.com" - #/email

Here is the test code:

fun test() {
    val schemaString = """
        {
          "${'$'}schema": "http://json-schema.org/draft-07/schema#",
          "type": "object",
          "properties": {
            "email": {
              "type": "string",
              "format": "email"
            }
          },
          "required": ["email"]
        }
    """.trimIndent()

    val jsonString = """
        {
            "email": "muñeca@test.com"
        }
    """.trimIndent()

    val schema = JSONSchema.parse(schemaString)
    val output = schema.validateBasic(jsonString)
    require(output.errors == null) {
        output.errors?.forEach {
            println("${it.error} - ${it.instanceLocation}")
        }
        "Json schema validation failed."
    }
}

Is there a way around this?

pwall567 commented 11 months ago

Hi, thanks for the message.

In implementing this library I have attempted to follow strictly the JSON Schema specification, which says (JSON Schema Validation, section 7.3.2:

email: As defined by the "Mailbox" ABNF rule in RFC 5321, section 4.1.2

And RFC 5321, section 4.1.2 contains the following ABNF rules:

Mailbox        = Local-part "@" ( Domain / address-literal )

Local-part     = Dot-string / Quoted-string

Dot-string     = Atom *("."  Atom)

Atom           = 1*atext

atext is defined in RFC5322 section 3.2.3 as being the ASCII alphabetic and numeric characters, plus the following ASCII special characters:

! # $ % & ' * + - / = ? ^ _ ` { | } ~

The Quoted-string rule allows any combination of ASCII characters within double quotes, but even that does not allow characters above hex 7E. In fact, the specification goes on to say:

Systems MUST NOT define mailboxes in such a way as to require the use in SMTP of non-ASCII characters (octets with the high order bit set to one) or ASCII "control characters" (decimal value 0-31 and 127).

I realise that in practice, many mail systems may ignore these rules and allow non-ASCII characters in mail addresses, but I feel that as an implementer of JSON Schema I have no option but to follow the specification as closely as possible.

All this explanation doesn't help in your case, but you might like to try a pattern validation – the emailregex web site contains a number of suggestions (the form of Regex used by the library is of course the Java form).

I may consider allowing pluggable implementations of the format validations in a later version of the library, but I can't give you a timeline for that.

Sorry I can't be more help,

-Peter Wall

Alina-Valea-Forter commented 11 months ago

Thanks, that was very informative. I will look into other options.