julianpeeters / sbt-avrohugger

sbt plugin for generating Scala sources for Apache Avro schemas and protocols.
Apache License 2.0
133 stars 50 forks source link

org.apache.avro.SchemaParseException: Illegal character in: return$ #86

Closed mcenkar closed 2 years ago

mcenkar commented 2 years ago

Hi,

Given:

@namespace( "foo.bar" )
protocol AProtocol {
  record A {
    union { null, int } id = null;
    union { null, string } return = null;
  }
}

It gets compiled to (correct and working):

case class A(id: Option[Int] = None, `return`: Option[String] = None)

It gets compiled to:

case class A(id: Option[Int] = None, return$: Option[String] = None)

In my case I eventually get following exception:

vulcan.AvroException$$anon$1: org.apache.avro.SchemaParseException: Illegal character in: return$
  at vulcan.AvroException$.apply(AvroException.scala:18)
  at vulcan.AvroError$$anon$3.throwable(AvroError.scala:282)
  at fs2.kafka.vulcan.AvroDeserializer$.$anonfun$using$extension$3(AvroDeserializer.scala:55)

Caused by: org.apache.avro.SchemaParseException: Illegal character in: return$
  at org.apache.avro.Schema.validateName(Schema.java:1566)
  at org.apache.avro.Schema.access$400(Schema.java:91)
  at org.apache.avro.Schema$Field.<init>(Schema.java:546)
  at org.apache.avro.Schema$Field.<init>(Schema.java:585)

Only workaround I've found is sticking to RC17 version.

julianpeeters commented 2 years ago

Thanks for the report @mcenkar. Unfortunately, my time-frame for a fix is very far out. PR's welcome of course, or else, if possible, you might try renaming the field to avoid Java keywords.

I can't recall why avrohugger adopted avro's field mangling strategy (it seems likely that only Specific format would need mangling, but I'm not certain): https://github.com/apache/avro/blob/9a378ffab2b49d95edcd2dba36691096301b64ee/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java#L97

mangling: https://github.com/julianpeeters/avrohugger/blob/197f8b51b7ab01595952483021199a52ad8b77fa/avrohugger-core/src/main/scala/format/FieldRenamer.scala#L5

related: https://github.com/julianpeeters/sbt-avrohugger/issues/85

mcenkar commented 2 years ago

Hi,

so had a quick look and this file:

@namespace( "foo.bar" )
protocol AProtocol {
  record A {
    union { null, string } return = null;
    union { null, int } abstract = null;
    union { null, int } assert = null;
    union { null, int } break = null;
    union { null, int } byte = null;
    union { null, int } case = null;
    union { null, int } catch = null;
    union { null, int } char = null;
    union { null, int } class = null;
    union { null, int } const = null;
    union { null, int } continue = null;
    union { null, int } default = null;
    union { null, int } do = null;
    union { null, int } else = null;
    union { null, int } extends = null;
    union { null, int } final = null;
    union { null, int } finally = null;
    union { null, int } for = null;
    union { null, int } goto = null;
    union { null, int } if = null;
    union { null, int } implements = null;
    union { null, int } instanceof = null;
    union { null, int } interface = null;
    union { null, int } native = null;
    union { null, int } new = null;
    union { null, int } package = null;
    union { null, int } private = null;
    union { null, int } protected = null;
    union { null, int } public = null;
    union { null, int } short = null;
    union { null, int } static = null;
    union { null, int } strictfp = null;
    union { null, int } super = null;
    union { null, int } switch = null;
    union { null, int } synchronized = null;
    union { null, int } this = null;
    union { null, int } throw = null;
    union { null, int } transient = null;
    union { null, int } try = null;
    union { null, int } volatile = null;
    union { null, int } while = null;
  }
}

Compiles just fine using java -jar avro-tools-1.11.0.jar idl src/main/avro/a.avdl ./lol.avpr to:

{
  "protocol" : "AProtocol",
  "namespace" : "foo.bar",
  "types" : [ {
    "type" : "record",
    "name" : "A",
    "fields" : [ {
      "name" : "return",
      "type" : [ "null", "string" ],
      "default" : null
    }, {
      "name" : "abstract",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "assert",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "break",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "byte",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "case",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "catch",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "char",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "class",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "const",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "continue",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "default",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "do",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "else",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "extends",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "final",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "finally",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "for",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "goto",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "if",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "implements",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "instanceof",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "interface",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "native",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "new",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "package",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "private",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "protected",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "public",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "short",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "static",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "strictfp",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "super",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "switch",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "synchronized",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "this",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "throw",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "transient",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "try",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "volatile",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "while",
      "type" : [ "null", "int" ],
      "default" : null
    } ]
  } ],
  "messages" : { }
}

Those properties are coming from your RESERVED_WORDS. Only some of them don't compile:

boolean
double
enum
false
float
import
int
long
null
throws
true
void

For example boolean:

Exception in thread "main" org.apache.avro.compiler.idl.ParseException: Encountered " "boolean" "boolean "" at line 45, column 25.
Was expecting one of:
    <IDENTIFIER> ...
    "@" ...
    "`" ...

        at org.apache.avro.compiler.idl.Idl.generateParseException(Idl.java:1729)

What's your take on:

mcenkar commented 2 years ago

Found place where 2nd set is blocked: https://github.com/apache/avro/blob/release-1.11.0/lang/java/compiler/src/main/javacc/org/apache/avro/compiler/idl/idl.jj#L231-L265

mcenkar commented 2 years ago

Summary:

public class A extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord {
  private java.lang.CharSequence return$;
  private java.lang.Integer abstract$;
  private java.lang.Integer assert$;
  private java.lang.Integer break$;

But in scala we can always do backticks. So probably best way is to do backticks around scala keywords (different set than avro), and just drop $ mangle?

julianpeeters commented 2 years ago

names like boolean, or throws you don't really have to handle, it's not valid avro anyway, return which I need works better with ticks, where $ breaks it, do you remember any case where $ is needed?

IIRC, the $ was needed for java avro interop. I wonder what happens at runtime? I suspect org.apache.avro.specific might require the $ at runtime? And it also looks like vulcan is using org.apache.avro.generic under the hood.

My other concern is breaking changes.

So I might end up favoring the addition of some configuration, maybe for Standard format at least.

mcenkar commented 2 years ago

Hi,

I had a quick look and I think we don't need to do any config for this. It would be breaking change, but it would break a bug that's there right now. I added PR here, right now it's failing.

The only places where I found we may need $ is scavro interop, but that's actually using different mangling strategy: list of keywords, and here calling avro-compiler.

So in this PR instead I switched to mangling by backticks, and doing that only on scala keywords, as we don't need to do that on Java keywords.

I didn't test thorougly scavro part but everything still looks OK, and it fixes my problem.

(edit: Also #85 would be fixed)

Happy to hear your feedback.

Cheers

julianpeeters commented 2 years ago

Awesome, thanks very much for the eyes and the fix. Released as avrohugger 1.0.0 and sbt-avrohugger 2.0.0