jsonrainbow / json-schema

PHP implementation of JSON schema. Fork of the http://jsonschemaphpv.sourceforge.net/ project
MIT License
3.53k stars 351 forks source link

Possibly enum check is O(n) maybe can be O(1) for more performance #697

Closed lucasres closed 1 year ago

lucasres commented 1 year ago

I went to check how the lib work with enum validation in this file src/JsonSchema/Constraints/EnumConstraint.php and i realized that make foreach in enum possibilities and check if as equals. In the wrost case this method is O(n).

class EnumConstraint extends Constraint
{
    /**
     * {@inheritdoc}
     */
    public function check(&$element, $schema = null, JsonPointer $path = null, $i = null)
    {
        // Only validate enum if the attribute exists
        if ($element instanceof UndefinedConstraint && (!isset($schema->required) || !$schema->required)) {
            return;
        }
        $type = gettype($element);

        foreach ($schema->enum as $enum) {
            $enumType = gettype($enum);
            if ($this->factory->getConfig(self::CHECK_MODE_TYPE_CAST) && $type == 'array' && $enumType == 'object') {
                if (Parity::isEqualTo((object) $element, $enum)) {
                    return;
                }
            }

            if ($type === gettype($enum)) {
                if (Parity::isEqualTo($element, $enum)) {
                    return;
                }
            }
        }

        $this->addError(ConstraintError::ENUM(), $path, array('enum' => $schema->enum));
    }
}

Maybe can alter the schema for a struct that each enum is a key of array, thus foreach is not necessary and this method can reduced only isset in array

current struct

$schema->enum = ['BAR', 'FOO', 'BAR-FOO'];

proposed struct (maybe array_flip can make this)

$schema->enum = [
  'BAR' => 0, 
  'FOO' => 1,
  'BAR-FOO'  => 2
];

If possible, I would love help with this implementation

erayd commented 1 year ago

This suggestion is unsuitable, as enum values may be of any type, including types which cannot be used as a PHP array key.

lucasres commented 1 year ago

But I think the library has a gain in validating if the enum is composed of a string array and transforming the check into O(1) and in the worst case leaving O(n).

Let's create a fictionary scenery that we have a payload composed with m items of array and each item we have that check one enum string(i think that this is comum).

[
  {
    ...,
    "enum": "FOO"
  },
  {
    ...,
    "enum": "BAR"
  },
]

In this scenery the wrost case is O(m.n) and if have a another array inside that check same enum then have O(m.n²).

[
  {
    ...,
    "enum": "FOO",
    "another": [
      {
        ...,
        "enum": "BAR"
      }
  },
  {
    ...,
    "enum": "BAR",
    "another": [
      {
        ...,
        "enum": "BAR"
      }
  }
]

We can test if schema has stringable implemented and can posible check in O(1). if can reopen issue i can make PR for code or i can create another issue and more specificed.

erayd commented 1 year ago

But I think the library has a gain in validating if the enum is composed of a string array and transforming the check into O(1)...

Detecting a string array is itself an O(n) operation. You can't get O(1) performance by simply ignoring the O(n) preparatory step needed to get there.

Let's create a fictional scenario...

The potential O(m*n) scenario isn't great, but given how fast string comparison loops generally are, it's unlikely to be a notable culprit for real-world performance issues in most cases. This is not a compiling validator (e.g. ajv), so the validation process is necessarily slow by comparison in many respects. If you are needing to optimise to that degree then a compiling validator would probably be a much better choice for your project than this library, which is known to be a bit of a snail.

What is the actual scenario you are trying to address here - is this causing you real-world problems?

lucasres commented 1 year ago

No no, i think that is possible of increment in performance