Json Schema question - Githubissues

gedw99 commented 1 year ago

I was wondering if async api Standard supports json Schema ?

Cause then we can use that to describe the types but also to do validation.

I am not that familiar with async-api btw.

nats uses json Schema Inside to validations its own types btw. It’s inside the packages called jsm but is not really designed for external access.

You know there is nothing preventing the json Schema or even the async-api being deployed inside nats kv. Then nats Server has the Schema inside itself to do the validations . Just an interesting way to have it all inside nats like a database has the data and schema inside itself.

lerenn commented 1 year ago

AsyncAPI uses JSON Schema in specifications for references and objects (More info).

It's what's used by asyncapi specification validators to validate asyncapi documents.

If we can add these to NATS, why not, it could be interesting!

gedw99 commented 1 year ago

read half that doc but found this one that talks about validation in more detail..

“ The AsyncAPI document is important because payload schemas are taken from it, and messages are validated against it in your application “: https://www.asyncapi.com/docs/guides/message-validation

—-

So here’s my first thoughts on implementing a design from a nats pov, rather than Kafka.

So that any actor can validate there needs to be a registry. NATS KV Store or NATS Object Store can do that well, rather than leaving it floating around on disk. So now the schema is on the nats Server and any nats client can subscribe to any change to any schema at runtime too. Easy registry system :)

As a dev Building a Service I want me and others to validate my messages. So , at build time you embed the schema in your golang binary.

You deploy your service wherever.

At runtime, your service starts up and the init event sends the Schema to the server and the server persists it into the registry so that all your services and other services can use it to do validation. It of course is designed to first check if that schema version is registered or not .

At runtime any actor can pull out the schema and check a message.

Schema evolution can now have a basis because you can have many async api versions registered in the system.

lerenn commented 1 year ago

That looks like a great idea!!

When storing the schema on KV Store or Object Store, you mean the AsyncAPI document describing the schemas?

I would also think about having the asyncapi document version embedded in each message header for the app to upgrade the protocol when it changes.

I would also be great to have a way to generate multiple version of the same AsyncAPI document into code and having the code to automatically process messages based on the asyncapi document version embedded in it.

It would be something like :

Server/clients being live with code generated from asyncapi documents v1.0.0 and processing messages from version 1.0.0 from both side
Deploying binary with code generated from asyncapi documents v1.0.0 and v1.1.0 on client and server
Still processing old messages from version 1.0.0 from both side right after deployment
Suddenly processing new messages from version 1.1.0 from both side when no old messages left on queues
Deploying binary with code generated from only asyncapi document v1.1.0 on client and server

That would make migration fairly easy. What do you think ?

gedw99 commented 1 year ago

Yes I mean the async api document.

I agree with you plan.

There is a problem in that you code gen against that schema and so I don’t know how you will handle different versions of the schema changing at run time.

Maybe it’s just how it is and that the limitation of the system It’s a very common problem.

Maybe later it’s worth thinking about making your code generator into an interpreter.

i use a funny solution in that I take the generated code and compile it to wasm and then run the wasm with Wazero.

there’s a few other systems using nats with Wazero for similar reasons. It’s docker without docker :)

so the code is now runtime versioned and I store that in Nat object store and all my services can always have the latest wasm version of the code thanks to nats object store watcher.

then at runtime I check the schema version, load up the wasm that matches and off we go.

food for thought :)

So you solve the problem of old versions of data being inside nats streams. You’re just letting it be .

To garbage collect old versions you could record how often a version is used and eventually retire the data or give a notification to the developer that owns the schema that version C is not used by anyone and so it can be retired. It’s facing reality. You can store these metrics in you know where - have a hammer so everything looks like a nut analogy :). Use a stream that is a counter basically.

Then there is the tricky area of what to do about that version of the data in the nats streams. I guess either upgrade it or delete it. I don’t know yet what facilities async api has for upgrading data. Or if we use using protobufs maybe there is a way . Not sure right now.

but I like my wasm trick and the recording of what versions are being used. It’s nice to plan ahead and have visibility as a developer and hence owner of a schema because you can at least make informed decisions.

here is the code 👍

https://github.com/stealthrocket/timecraft

gedw99 commented 1 year ago

I should add that the wasm idea is pretty bleeding edge.

nats.Go does compile to wasm.

the host ( the thing running the wasm ) controls what network and file system access the guest wasm has. I am using host and guest terms like how is used for virtual machines etc .

the other cool think this give you is serverless style architecture. You only need the host on any server or desktop and you’re able to pull the wasm from nats at runtime. Nats is your dockerish registry and store.

TimeCraft is designed for advanced wasi and only runs on Linux due to the unix sockets. But because we are using nats you can use capsule instead which is going to e hooked up to nats anyway .

capsule runs wasm ( not wasi ) which is all that’s needed.

https://github.com/bots-garden/capsule

lerenn commented 1 year ago

You're right, for now, the objective is not to change the schema at runtime, however it would be pretty cool!

The WASM idea is really interesting, and it would clearly help to have a real dynamic thing that could not rely on code generation (and keeping the schema only until there is no more clients using it).

I may concentrate on traditional code generation for the time being, but I'll take look once done to experiment with WASM and try to make this more as an interpreter for the broker side !

The only thing I wonder is more how you connect the WASM output to your existing system, but I think I'll have take more time to wrap my mind around it.

Thanks for the food for thoughts ! :)

gedw99 commented 11 months ago

Hey again ,

i noticed that in the kafka community if that they tend to design for services to NOT be version aware , and for the Broker to provide protobufs or avro so that any version of the service can tolerate different message versions.

Same goes for non broker systems like GRPC. All clients can be at any version because the Server always uses protobufs.

Your project wants to use json and json Schema.

A simple example is when a field is added to a message. With protobuf an old client ignores that added message . Do you know if asyncapi can handle that ?

At the moment I use Iceberg for databases with DuckDB. It has similar issues because you design on NOT doing das migration , and instead on the DB tolerating multiple versions of the data .

gedw99 commented 11 months ago

Probably also best to run the examples with many clients at different versions . Like a Test Harness to see what truths hold water.

lerenn commented 11 months ago

Normally, when adding a field to a new version, it will just be ignored in new version (as golang structure won't have a recipient for it).

But it should not cause any trouble :

If you have this structure in version 1:

type MyStructV1 struct {
    A string
}

And this structure in version 2:

type MyStructV2 struct {
     A string
     B string
}

Then marshaling the struct V2 will give this json:

{"a":"...","b":"..."}

And the struct V1 will ignore the field B, and just use the A field.

Regarding the example, I could add one ! There is actually a test for it right now if you want to take a look.

gedw99 commented 11 months ago

Hey

thanks for that..

I started to test it. Its here: https://github.com/gedw99/gio-htmx/tree/main/exp/async-api

I use Procfile to run everything, rather than Docker. I tend to to it this way because a lot of stuff I work on must run on anything without docker.

Goreman runs the Profile on Windows or Linux / Darwin.

NOTE: I need to change the dependencies installer so it only installs into the .bin inside the repo, so it will not pollute any other folders on the host.

gedw99 commented 11 months ago

looks like we can close this.. open if you need to...

lerenn / asyncapi-codegen

Json Schema question #64