redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.58k stars 582 forks source link

schema-registry avro primitive different than confluent #4970

Open raphaelauv opened 2 years ago

raphaelauv commented 2 years ago

Version & Environment

docker.vectorized.io/vectorized/redpanda:v22.1.3

confluentinc/cp-schema-registry:7.1.0

What went wrong?

The schema-registry of redpanda do not manage the same way the primitive avro schema than the confluent schema-registry

What should have happened instead?

localhost:9081 is the port of the redpanda schema-registry

curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
--data '{"schema": "{\"type\": \"string\"}"}' \
http://localhost:9081/subjects/test-key/versions

give {"id":1}

if I curl http://localhost:9081/schemas/ids/1 -> it give : {"schema":"{\"type\":\"string\"}"}


localhost:8098 is the port of the confluent schema-registry

curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
--data '{"schema": "{\"type\": \"string\"}"}' \
http://localhost:8098/subjects/test-key/versions

give {"id":1}

if I curl http://localhost:8098/schemas/ids/1 -> it give : {"schema":"\"string\""}

JIRA Link: CORE-931

raphaelauv commented 2 years ago

same problem with aiven karapace -> https://github.com/aiven/karapace/issues/411

twmb commented 2 years ago

Those two schemas are equivalent, it looks like the confluent schema registry normalized the schema. AFAICT, schema normalization is not the default and needs to be opted into with normalize.schemas=true in the schema serializer, or a normalize=true query parameter when POSTing:

https://docs.confluent.io/platform/current/schema-registry/develop/api.html#post--subjects-(string-%20subject)-versions https://docs.confluent.io/platform/current/schema-registry/serdes-develop/index.html#schema-normalization

In Confluent's Schema Registry, normalization was temporarily made the default but then later reverted because some people want to retain doc fields, etc:

https://github.com/confluentinc/schema-registry/pull/687 <- made default https://github.com/confluentinc/schema-registry/pull/1035 <- reverted by 5.2.x

I'm not sure why you're seeing normalization by default with the registry, as everything indicates that is not the default.

vuldin commented 5 months ago

I community user also reported similar issues today when trying to create a schema with the following command (both with and without the ?normalize=true param):

curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": "{\"type\":\"record\",\"name\":\"car\",\"fields\":[{\"name\":\"model\",\"type\":{\"type\":\"string\"}},{\"name\":\"make\",\"type\":\"string\"},{\"name\":\"year\",\"type\":\"float\"}]}"}' "http://localhost:8081/subjects/a1-value/versions"

In Confluent:

{
    "type": "record",
    "name": "car",
    "fields": [
        {
            "name": "model",
            "type": "string"
        },
        {
            "name": "make",
            "type": "string"
        },
        {
            "name": "year",
            "type": "float"
        }
    ]
}

But in Redpanda:

{
    "type": "record",
    "name": "car",
    "fields": [
        {
            "name": "model",
            "type": 
                {
                     "type": "string"
                },
        },
        {
            "name": "make",
            "type": "string"
        },
        {
            "name": "year",
            "type": "float"
        }
    ]
}