Open drewrobb opened 7 years ago
@drewrobb, were you able to find a resolution for this? We are facing what looks like a similar issue with a highly nested schema
@dbkegley we have not found a resolution to this, nor even have a proposed way to fix it
Schemas in Spark must be known ahead of time. A possible workaround would be to set a limit on the recursion depth when generating a schema. Would that be useful?
That sounds like it would fix my use case. We aren't storing arbitrarily deep trees or anything-- mostly just single level recursion like in the example in this issue.
FWIW, for a single level, you could do something like this:
message Person { ... }
message PersonWithOtherPerson {
optional Person main = 1;
optional Person other_person = 2;
}
The downside is that this pushes the parent Person to a field, rather than in the top level. One way to get around this is to have an implicit conversion between PersonWithOtherPerson
and Person
.
@thesamet I think this would work for us as well. Unfortunately we only consume so don't have access to update the schema. We can advise against recursive fields but there's no guarantee the producers will follow our recommendation
@thesamet, I'm in the same boat where I, as a consumer, cannot control the source. It would be great if the ProtoSQL driver could have a recursion-depth limit/parameter. As a workaround, I'm looking into a way to flatten this out before it hits Spark. The recursion is maximum 10 deep if that helps.
I'm using Scala 2.11.12, Spark 2.4.4, sparksql-scalapb 0.9.2, sbt-protoc 0.99.28, scalapb compilerplugin 0.9.7.
just wandering if this issue was addressed in the newer release of scalapb? We are facing a similar issue.
Hi @anjshrg , the issue is still not resolved. PRs will be welcome!
I found the same issue using the Protobuf field type google.protobuf.Struct. This field type contains nested Struct types, therefore I've got a StackOverflow error. Any idea on how to tackle this issue when we can't control the schema?
Adding a recursive field to a proto breaks things, see https://github.com/drewrobb/sparksql-scalapb-test/commit/4cfc436c5a3a9f75d4218a0695ff7e9c2b8300e3 for a reproduction. I'm happy to help address this if you have a recommended approach to solving it?