Open hamnis opened 2 years ago
This is a very interesting idea! If I understand it right, this means we can have schemas with properties like this:
"properties: {
"xyz": { "$ref": "iglu:com.snowplowanalytics.iglu/anything-a/jsonschema/1-0-0" }
}
Until now, I have always discouraged people from using the $ref
keyword in iglu schemas. And it was for three reasons:
http://example.com/path/to/schema
) then it bypasses Iglu's strict versioning rules. It is a principle of Iglu that schemas are static once published, and that means we should not allow them to point to just any url.$ref
keyword. It means if a schema uses $ref
then for events in a Snowplow pipeline, the fields will not get loaded into the warehouse.$ref
, it blocks the jvm thread. In your patch, this happens on the line where you call dispatcher.unsafeRunSync(...)
. In Snowplow apps, which are required to be high throughput, we have to be very careful about which threads get blocked and when.Your patch addresses problem 1, and I think that is a huge improvement. It is a big step in the right direction.
But I think we need to properly address problems 2 and 3 before we can claim that Iglu fully supports this type of external lookup.
A couple of smaller implementation comments:
I don't think you need the ValidatorF
trait. I think your new validator implementation could simply implement the old Validator
trait.
If we add this feature, I would prefer to amend the old validators instead of adding a new validator. It would be much simpler if we can say that all iglu validators support external refs, instead of saying that some do and some don't depending on how you use it. This might need a breaking change to the code, so it would be a new major release of the library (version 3).
If iglu-client substitutes $ref
for its contents when json body is returned, it would solve the point 2.
@voropaevp that's a good idea. It would solve the point 3 too, because if we do it in the resolver then we have better control over how threads/fibers get used.
Implementation would need to be careful about resolution depth and possible loops.
A way to fix the blocking issue is to change Sync[F].delay
to Sync[F].blocking
or Sync[F].interruptable
if this is a cancelable operation.
Hi @hamnis, I like how this brings us closer to JSON Schema compatibility, although it would be interesting to understand the use cases behind this feature. That would help us think through if any changes are needed in our UI flows as well.
I assume you’d like to reuse the same “subschema” in a set of schemas? Would this be primarily for event schemas or for entity schemas? Could you give some examples? In particular, if the goal is to add the same “subschema” to multiple event schemas, it’s already possible just add that “subschema” as an entity/context to all such events. But I imagine there is something you prefer in the $ref
approach?
This would be reuse in other entity schemas.
A particular use case is that we have a concept called Plug
s, each plug displays a Content
item.
There may be multiple plugs visible plugs on a page, which also May represent a Content
item.
We would typically attach a set of plugs to an impression
event.
For now we have embedded the Content
definition within our Plug
definition.
I can see other usecases, but this is what we currently have.
Hey!
Any updates on this?
Our use cases:
The underlying library support uri resolving for custom schemes. Attached is a possible patch that can enable this feature
circevalidator.patch