linkedin / goavro

Apache License 2.0
976 stars 218 forks source link

cannot encode union (union): datum ought match schema: expected: array, null; received: []string #115

Closed johntdyer closed 6 years ago

johntdyer commented 6 years ago

Hello,

I am getting the following error when trying to encode an array , I am not sure why, hoping its just me doing something stupid.

Error

2018/06/07 15:22:00 Panic recovery -> cannot encode record (LmaEventSchema): cannot encode union (union): datum ought match schema: expected: array, null; received: []string
/usr/local/Cellar/go/1.10.1/libexec/src/runtime/panic.go:502 (0x102ac28)
    gopanic: reflectcall(nil, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))
/Users/johndye/Projects/gocode/src/github.com/johntdyer/schema-registry-test/simple-avro-kafka-golang/producer/temp/main.go:135 (0x137b0a3)
    message: panic(err)
/Users/johndye/Projects/gocode/src/github.com/johntdyer/schema-registry-test/simple-avro-kafka-golang/producer/vendor/github.com/gin-gonic/gin/context.go:95 (0x1357d22)
    (*Context).Next: c.handlers[c.index](c)
/Users/johndye/Projects/gocode/src/github.com/johntdyer/schema-registry-test/simple-avro-kafka-golang/producer/vendor/github.com/gin-gonic/gin/logger.go:56 (0x1364892)
    LoggerWithWriter.func1: c.Next()
/Users/johndye/Projects/gocode/src/github.com/johntdyer/schema-registry-test/simple-avro-kafka-golang/producer/vendor/github.com/gin-gonic/gin/context.go:95 (0x1357d22)
    (*Context).Next: c.handlers[c.index](c)
/Users/johndye/Projects/gocode/src/github.com/johntdyer/schema-registry-test/simple-avro-kafka-golang/producer/vendor/github.com/gin-gonic/gin/recovery.go:43 (0x13652d9)
    RecoveryWithWriter.func1: c.Next()
/Users/johndye/Projects/gocode/src/github.com/johntdyer/schema-registry-test/simple-avro-kafka-golang/producer/vendor/github.com/gin-gonic/gin/context.go:95 (0x1357d22)
    (*Context).Next: c.handlers[c.index](c)
/Users/johndye/Projects/gocode/src/github.com/johntdyer/schema-registry-test/simple-avro-kafka-golang/producer/vendor/github.com/gin-gonic/gin/gin.go:294 (0x135cf3c)
    (*Engine).handleHTTPRequest: context.Next()
/Users/johndye/Projects/gocode/src/github.com/johntdyer/schema-registry-test/simple-avro-kafka-golang/producer/vendor/github.com/gin-gonic/gin/gin.go:275 (0x135c952)
    (*Engine).ServeHTTP: engine.handleHTTPRequest(c)
/usr/local/Cellar/go/1.10.1/libexec/src/net/http/server.go:2694 (0x128183b)
    serverHandler.ServeHTTP: handler.ServeHTTP(rw, req)
/usr/local/Cellar/go/1.10.1/libexec/src/net/http/server.go:1830 (0x127da60)
    (*conn).serve: serverHandler{c.server}.ServeHTTP(w, w.req)
/usr/local/Cellar/go/1.10.1/libexec/src/runtime/asm_amd64.s:2361 (0x1056200)
    goexit: BYTE    $0x90   // NOP

Code

    someRecord, err := goavro.NewRecord(goavro.RecordSchema(lmaSchemaJSON))
    if err != nil {
        panic(err)
    }

    someRecord.Set("payload_type", "log")  //string(dataMarshal))
    someRecord.Set("hostname", "host1234") // int64(1082196484))
    someRecord.Set("payload", "{\"foo\" : \"bar\"}")
    someRecord.Set("loglevel", "debug")
    someRecord.Set("datacenter", "achm")
    someRecord.Set("servicelevel", "production")
    someRecord.Set("source", "mercury")
    someRecord.Set("tags", []string{"foo"})

    codec, err := goavro.NewCodec(lmaSchemaJSON)
    if err != nil {
        panic(err)
    }

    bb := new(bytes.Buffer)
    if err = codec.Encode(bb, someRecord); err != nil {
        panic(err)
    }

Schema

{
    "type": "record",
    "name": "LmaEventSchema",
    "fields": [
        {
            "name": "hostname",
            "type": [
                "null",
                "string"
            ],
            "doc": "Hostname of machine sending event",
            "default": null
        },
        {
            "name": "payload",
            "type": [
                "null",
                "string"
            ],
            "doc": "Contains the payload to transport",
            "default": null
        },
        {
            "name": "payload_type",
            "type": [
                "null",
                "string"
            ],
            "doc": "Identifies the content type of the payload. E.g.: log, metric",
            "default": null
        },
        {
            "name": "payload_format",
            "type": [
                "null",
                "string"
            ],
            "doc": "Identifies the format of the payload string. E.g.: json, none",
            "default": null
        },
        {
            "name": "tags",
            "type": [
                {
                    "type": "array",
                    "items": "string"
                },
                "null"
            ],
            "doc": "A list of values that otherwise describe this event - e.g. 'security', 'access'",
            "default": []
        },
        {
            "name": "source",
            "type": [
                "null",
                "string"
            ],
            "doc": "identifies the emitter e.g. 'mercury', 'locus'",
            "default": null
        },
        {
            "name": "servicelevel",
            "type": [
                "null",
                "string"
            ],
            "doc": "e.g. 'production', 'integration'",
            "default": null
        },
        {
            "name": "datacenter",
            "type": [
                "null",
                "string"
            ],
            "doc": "data center of log event e.g. 'achm', 'achm2'",
            "default": null
        },
        {
            "name": "loglevel",
            "type": [
                "null",
                "string"
            ],
            "doc": "log level e.g. 'DEBUG', 'INFO'",
            "default": null
        },
        {
            "name": "environment",
            "type": [
                "null",
                "string"
            ],
            "doc": "e.g. the application environment e.g. 'a6', 'a7'",
            "default": null
        },
        {
            "name": "type",
            "type": [
                "null",
                "string"
            ],
            "doc": "'app' (same for all logs emitted by logback)",
            "default": null
        }
    ]
}
johntdyer commented 6 years ago

Hello guys, I just wanted to follow up on this and see if anyone had any thoughts on what I am doing wrong here ?

karrick commented 6 years ago

Hey, it's been a while since I've logged in to check up and just say your question.

First off I do recommend using goavro/v2. It's 3--4x faster on some really super conflated payloads here at my present employment, and provides a much easier to use API, especially for records. In v2 you simply create and populate a Go map[string]interface{} data structure with your record's values and feed that to the encoder. And when decoding a record, the decover returns a Go map[string]interface{} back to you, with all of the keys and values put in their proper places. See the example file at https://github.com/linkedin/goavro/blob/master/examples/nested/main.go.

Second, I know v2 has the following feature, and I'm not sure whether the original API had it, but the newer versions of the encoder provide the syntactic sugar that you were surprised did not exist in your question above. As you know, an array of strings should be allowed by the encoder rather than an array of empty interfaces, each of which is a string. This is allowed in the v2 encoder. See the example using a slice of int values in the test case: https://github.com/linkedin/goavro/blob/master/array_test.go#L83

I'm going to close this issue, and I regret the delay, but please re-open if you have any additional questions.