julianpeeters / avrohugger

Generate Scala case class definitions from Avro schemas
Apache License 2.0
201 stars 120 forks source link

Split up large schemas for AvroGenerateSpecific #168

Closed natefitzgerald closed 11 months ago

natefitzgerald commented 1 year ago

This fixes https://github.com/julianpeeters/avrohugger/issues/167, which is about the schema$ val that is added to the companion object of the generated case classes for Specific record types.

The bummer is that the testing paradigm in place here uses example files and in order to test this you obviously need a super long schema and thus a super long example file corresponding to it. I considered reworking some of these tests to generate the schema file and then check that the files compile but that seemed overkill. The downside is that I've now added 28k lines of testing for very niche functionality.

julianpeeters commented 1 year ago

The downside is that I've now added 28k lines...

lmao, good stuff.

julianpeeters commented 1 year ago

Does your PR fix the issue for you? If so, I'm happy to publish to cover your use case.

I slapped biguser.avsc into sbt-avrohugger's scripted tests (here) and ran sbt scripted avrohugger/SpecificSerializationTests, but compilation failed: while the field limit is just a gotcha I think, the others might be relevant, but my timeframe for digging deeper is a little ways off.

...
BigUser.scala:6:25: Platform restriction: a parameter list's length cannot exceed 254.
...
[info] [error] Error while emitting example/BigUser
[info] [error] UTF8 string too large
[info] [error] Error while emitting example/BigUser$
[info] [error] UTF8 string too large
...
natefitzgerald commented 1 year ago

That's my bad, I only checked that the companion object compiled fine. I should have remembered the parameter length limit, it's not actually a problem with our large schemas because they're nested. I can't actually bring one in here so I attempted to generate one. I'll figure it out.

julianpeeters commented 11 months ago

Good news, @LeonPoon brings fish: https://github.com/julianpeeters/avrohugger/pull/182

biguser.avsc is now passing in sbt-avrohugger v2.6.0 scripted tests (here), we can reopen this issue as needed.

Huge thanks for getting this rolling.

steve-e commented 9 months ago

Hi @julianpeeters My team has a client with 919 fields, ie more than the JVM limit for parameter lists of 254. I have put a sample avsc locally into avrohugger/SpecificSerializationTests into sbt-avrohugger and get this error

[info] [error] /private/var/folders/gc/xjm7lrlj5cn9kwbz7d0mwpbh0000gn/T/sbt_8a9778d4/target/scala-2.12/src_managed/main/compiled_avro/example/MoreThan254Fields.scala:6:35: Platform restriction: a parameter list's length cannot exceed 254.
[info] [error] final case class MoreThan254Fields(var field1: Int, var field2: Int, var field3: Int, var field4: Int, var fiel

(By the way, biguser.avsc has 251 fields, so that is within the JVM limit)

The way my colleague worked around this was to hand edit the generated file to move most of the fields from the parameter list to be fields declared in the class. This works for our use case.

I can try and create a PR to handle this issue, by generating a class with only a default no arg constructor. This could be the only behaviour for parameter lists longer than 254.

But perhaps you have other ideas on how to handle this?

julianpeeters commented 9 months ago

Glad you found a workaround, @steve-e, and thanks for sharing it.

However, I must decline your kind offer, due to maintenance reasons.

steve-e commented 9 months ago

Hi @julianpeeters Understood. When producing a fix for this issue I ran into another, and can foresee other problems too. We may fork this repo to handle our use case, as the oversized schema is regularly updated, so code needs to be regenerated on a regular basis.

You may be interested in the changes I have made so far, which I have put in a draft and closed PR to make it easier for you and others to view. This PR solves the issues we have.