swaggest / php-json-schema

High definition PHP structures with JSON-schema based validation
MIT License
446 stars 50 forks source link

References Are Resolved Despite Setting Dereference to False #145

Open slucero opened 2 years ago

slucero commented 2 years ago

For my use case I'm encountering challenges importing schemas that contain a lot of references, so I wanted to try an approach where I would import the schema initially without resolving references and get those filled in later. Unfortunately, I found that even if I set the dereference option to false in the context, references are still attempting to be resolved. Below is a test example I put together to demonstrate:

<?php

/**
 * Integration test for expected behavior from schemas.
 *
 * @coversNothing
 */
class SchemaIntegrationTest extends UnitTestCase {

  use ProphecyTrait;

  protected string $schemaWithReferenceJson = <<<JSON
    {
      "\$schema": "http://json-schema.org/draft-04/schema#",
      "category": "test",
      "title": "Schema with reference",
      "type": "object",
      "format": "grid",
      "properties": {
        "reference_property": {
          "\$ref": "my/example/reference"
        }
      }
    }
    JSON;

  /**
   * Test schema dereferencing behavior.
   */
  public function testSchemaDereferencing() {
    /** @var \Swaggest\JsonSchema\RemoteRefProvider $refProvider */
    $refProvider = $this->prophesize(RemoteRefProvider::class);
    $refProvider->getSchemaData('my/example/reference')
      ->willReturn((object) [])
      ->shouldNotBeCalled();

    $context = new Context();
    $context->setRemoteRefProvider($refProvider->reveal());
    $context->dereference = FALSE;

    $schema_data = json_decode($this->schemaWithReferenceJson);

    $schema = Schema::import($schema_data, $context);

    $schema_output = json_encode($schema);
    $this->assertStringContainsString('my/example/reference', $schema_output);
  }

}

This test fails with the following output:

Failed asserting that '{"$schema":"http:\/\/json-schema.org\/draft-04\/schema#","title":"Schema with reference","properties":{"reference_property":{}},"type":"object","format":"grid","category":"test"}' contains "my/example/reference".

If I comment out the final assertion that causes the test to fail immediately, I get the following output from the Prophecy prediction checks confirming that the ref provider was called:

Some predictions failed:
Double\RemoteRefProvider\P1:
  No calls expected that match:
      Double\RemoteRefProvider\P1->getSchemaData(exact("my/example/reference"))
    but 1 was made:
      - getSchemaData("my/example/reference") @ vendor/swaggest/json-schema/src/RefResolver.php:196
 /var/www/html/vendor/phpspec/prophecy-phpunit/src/ProphecyTrait.php:61
 /var/www/html/vendor/phpunit/phpunit/src/Framework/TestResult.php:726
 /var/www/html/vendor/phpunit/phpunit/src/Framework/TestSuite.php:670
 /var/www/html/vendor/phpunit/phpunit/src/Framework/TestSuite.php:670
 /var/www/html/vendor/phpunit/phpunit/src/TextUI/TestRunner.php:673
 /var/www/html/vendor/phpunit/phpunit/src/TextUI/Command.php:143
 /var/www/html/vendor/phpunit/phpunit/src/TextUI/Command.php:96
slucero commented 2 years ago

From some debugging into the issue, I found that the dereference option is being overridden in Schema::processObject() here:

                $refProperty = null;
                $dereference = $options->dereference;

                if ($this->properties !== null && isset($array[self::PROP_REF])) {
                    $refPropName = self::PROP_REF;
                    if ($hasMapping) {
                        if (isset($this->properties->__dataToProperty[$options->mapping][self::PROP_REF])) {
                            $refPropName = $this->properties->__dataToProperty[$options->mapping][self::PROP_REF];
                        }
                    }

                    $refProperty = $this->properties[$refPropName];

                    if (isset($refProperty)) {
                        $dereference = $refProperty->format === Format::URI_REFERENCE;
                    }
                }

If you're using the Schema class directly without any overrides, it seems this will always get set to true due to the configuration of the ref property in JsonSchema::setUpProperties() where the ref property is always setup as a string with the uri-reference format:

        $properties->ref = JsonBasicSchema::string();
        $properties->ref->format = Format::URI_REFERENCE;
        $ownerSchema->addPropertyMapping('$ref', self::names()->ref);

Is there a different way the dereference flag should be used, or would it be possible to potentially default the Context->dereference property to NULL instead and only override it if it was not explicitly set?

vearutop commented 2 years ago

The purpose of context dereference option is to allow following references for JSON schemas of values that are not JSON schemas themselves. 🙃

An example of such usage would be importing OpenAPI 3 schema like here: https://github.com/swaggest/swac/blob/v0.1.30/src/OpenAPI3/Reader.php#L92. OpenAPI uses $ref same way as JSON Schema does, so following references makes sense for it.

In general case some $ref property of an arbitrary JSON object could contain a value that is not intended to be resolved.

So maybe a less misleading name could be forceDereference.

As for your original issue, I think unfortunately it won't work with this library (at least in current implementation), because it needs all references to be resolved for a successful schema import. Maybe you can workaround your issue by preloading faulty references with empty schemas.

Preloading example: https://github.com/swaggest/php-json-schema/blob/v0.12.40/tests/src/PHPUnit/Ref/RefTest.php#L134-L147

        $refProvider = new Preloaded();
        $refProvider->setSchemaData('#/definitions/foo', new \stdClass()); // Empty object means permissive schema {}.
        $refProvider->setSchemaData('http://somewhere/unresolvable/bar.json', new \stdClass());

        $options = new Context();
        $options->setRemoteRefProvider($refProvider);
        $schemaJson = <<<'JSON'
{"$ref": "http://somewhere/unresolvable/bar.json"}
JSON;
        $schema = Schema::import(json_decode($schemaJson), $options);
slucero commented 2 years ago

For now I was able to work around the issue with a custom Schema class implementation included below. During import, it checks if the $dereference$ option was set to false and temporarily changes the default format on the$refproperty to prevent the existing logic noted inSchema::processObject()from overriding the setting for the$dereference` variable.

In my use case, this allows for importing the schema and running data preprocessors on it for any necessary customization after which it can be re-encoded to JSON with the references in tact.

/**
 * A schema implementation to process schemas without resolving reference links.
 *
 * For more detail on the need for this as a custom class, see the
 * ::disableAutomaticReferences method.
 *
 * Following import of a schema using this class, references may be resolved
 * into a completed schema using the ::resolve() method as follows:
 *
 * @code
 * $unresolvedSchema = UnresolvedSchema::import(json_decode($schemaJson));
 * $resolvedSchema = $unresolvedSchema->resolve();
 * @endcode
 *
 * @see https://github.com/swaggest/php-json-schema/issues/145
 *
 * @todo Remove this class once swaggest/php-json-schema#145 is resolved.
 */
class UnresolvedSchema extends Schema {

  /**
   * Import data into the schema.
   *
   * If $options->dereference is set to FALSE, per the default, references will
   * not be resolved.
   *
   * @param mixed $data
   *   Schema data to be imported.
   * @param \Swaggest\JsonSchema\Context $options
   *   Processing options for the import operation.
   *
   * @return static|mixed
   *   The imported JSON schema object.
   *
   * @throws \Swaggest\JsonSchema\Exception
   * @throws \Swaggest\JsonSchema\InvalidValue
   */
  public static function import($data, Context $options = NULL) {
    if ($options === NULL) {
      $options = new Context();
    }

    // Override class mapping to ensure result objects are an instance of this
    // class.
    $options->objectItemClassMapping[Schema::className()] = UnresolvedSchema::className();

    if (!$options->dereference) {
      $reset = static::disableAutomaticDereferencing();
    }

    $result = parent::import($data, $options);

    if (isset($reset)) {
      $reset();
    }

    return $result;
  }

  /**
   * {@inheritdoc}
   */
  public static function setUpProperties($properties, Schema $ownerSchema) {
    parent::setUpProperties($properties, $ownerSchema);

    // Unset the format to disable automatic imports in single-level objects.
    $properties->ref->format = NULL;
  }

  /**
   * A utility function to prevent automatic resolution of schema references.
   *
   * Due to the issue documented in
   * @link https://github.com/swaggest/php-json-schema/issues/145 swaggest/php-json-schema#145 @endlink,
   * the dereference option is ignored and reference links are automatically
   * resolved. This function temporarily alters the schema definition driving
   * that decision, and returns a callback function to restore the change after
   * operations have completed.
   *
   * If the callback function is not executed, further schema import operations
   * within the same request, not just using this class, will not resolve
   * references.
   *
   * @return callable
   *   A reset function to restore default schema handling for reference links.
   *   No arguments are required for the callback function.
   *
   * @see https://github.com/swaggest/php-json-schema/issues/145
   */
  protected static function disableAutomaticDereferencing(): callable {
    $schema = Schema::schema();
    $refProperty = $schema->getProperties()->ref;
    $originalFormat = $refProperty->format;
    $refProperty->format = NULL;

    return function () use ($originalFormat) {
      $schema = Schema::schema();
      $refProperty = $schema->getProperties()->ref;
      $refProperty->format = $originalFormat;
    };
  }

  /**
   * Reprocess the imported schema to resolve schema references.
   *
   * @param \Swaggest\JsonSchema\Context|null $options
   *   Contextual options to influence schema processing.
   *
   * @return \Swaggest\JsonSchema\SchemaContract
   *   The resulting schema after import.
   *
   * @throws \Swaggest\JsonSchema\Exception
   * @throws \Swaggest\JsonSchema\InvalidValue
   */
  public function resolve(Context $options = NULL): SchemaContract {
    if ($options === NULL) {
      $options = new Context();
    }

    $options->dereference = TRUE;

    // Re-encode the JSON to ensure serialization is completed recursively.
    $schema_data = json_encode($this);

    return Schema::import(json_decode($schema_data), $options);
  }

  /**
   * Alter serialization to return references instead of resolving them.
   *
   * When a section of the schema is encountered that was resolved from a
   * reference, return the reference instead of continuing to traverse into it
   * and serialize the content of it in place. This allows for references to
   * leverage locally bundled contents and avoid succumbing to circular
   * references.
   *
   * @return object
   *   The object representation of the interpreted schema content.
   */
  public function jsonSerialize(): object {
    if (is_array($refs = $this->getFromRefs())) {
      return (object) [
        '$ref' => $refs[0],
      ];
    }
    else {
      return parent::jsonSerialize();
    }
  }

}