scalapb / sparksql-scalapb

SparkSQL utils for ScalaPB
Apache License 2.0
43 stars 28 forks source link

Protos with recursive fields fail with stack overflow #5

Open drewrobb opened 7 years ago

drewrobb commented 7 years ago

Adding a recursive field to a proto breaks things, see https://github.com/drewrobb/sparksql-scalapb-test/commit/4cfc436c5a3a9f75d4218a0695ff7e9c2b8300e3 for a reproduction. I'm happy to help address this if you have a recommended approach to solving it?

Exception in thread "main" java.lang.StackOverflowError
    at shadeproto.Descriptors$FieldDescriptor.getName(Descriptors.java:881)
    at com.trueaccord.scalapb.spark.ProtoSQL$.com$trueaccord$scalapb$spark$ProtoSQL$$structFieldFor(ProtoSQL.scala:65)
    at com.trueaccord.scalapb.spark.ProtoSQL$$anonfun$1.apply(ProtoSQL.scala:62)
    at com.trueaccord.scalapb.spark.ProtoSQL$$anonfun$1.apply(ProtoSQL.scala:62)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.AbstractTraversable.map(Traversable.scala:104)

      ......

    at com.trueaccord.scalapb.spark.ProtoSQL$.com$trueaccord$scalapb$spark$ProtoSQL$$structFieldFor(ProtoSQL.scala:62)
    at com.trueaccord.scalapb.spark.ProtoSQL$$anonfun$1.apply(ProtoSQL.scala:62)
    at com.trueaccord.scalapb.spark.ProtoSQL$$anonfun$1.apply(ProtoSQL.scala:62)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.AbstractTraversable.map(Traversable.scala:104)
dbkegley commented 7 years ago

@drewrobb, were you able to find a resolution for this? We are facing what looks like a similar issue with a highly nested schema

drewrobb commented 7 years ago

@dbkegley we have not found a resolution to this, nor even have a proposed way to fix it

thesamet commented 7 years ago

Schemas in Spark must be known ahead of time. A possible workaround would be to set a limit on the recursion depth when generating a schema. Would that be useful?

drewrobb commented 7 years ago

That sounds like it would fix my use case. We aren't storing arbitrarily deep trees or anything-- mostly just single level recursion like in the example in this issue.

thesamet commented 7 years ago

FWIW, for a single level, you could do something like this:

message Person { ... }

message PersonWithOtherPerson {
  optional Person main = 1;
  optional Person other_person = 2;
}

The downside is that this pushes the parent Person to a field, rather than in the top level. One way to get around this is to have an implicit conversion between PersonWithOtherPerson and Person.

dbkegley commented 7 years ago

@thesamet I think this would work for us as well. Unfortunately we only consume so don't have access to update the schema. We can advise against recursive fields but there's no guarantee the producers will follow our recommendation

colinlouie commented 4 years ago

@thesamet, I'm in the same boat where I, as a consumer, cannot control the source. It would be great if the ProtoSQL driver could have a recursion-depth limit/parameter. As a workaround, I'm looking into a way to flatten this out before it hits Spark. The recursion is maximum 10 deep if that helps.

I'm using Scala 2.11.12, Spark 2.4.4, sparksql-scalapb 0.9.2, sbt-protoc 0.99.28, scalapb compilerplugin 0.9.7.

anjshrg commented 3 years ago

just wandering if this issue was addressed in the newer release of scalapb? We are facing a similar issue.

thesamet commented 3 years ago

Hi @anjshrg , the issue is still not resolved. PRs will be welcome!

MCardus commented 2 weeks ago

I found the same issue using the Protobuf field type google.protobuf.Struct. This field type contains nested Struct types, therefore I've got a StackOverflow error. Any idea on how to tackle this issue when we can't control the schema?