scalapb / ScalaPB

Protocol buffer compiler for Scala.
https://scalapb.github.io/
Apache License 2.0
1.31k stars 285 forks source link

issue with using bcl classes from protobuf.net #726

Closed PKHeb closed 4 years ago

PKHeb commented 4 years ago

i imported a bcl class from protobuf.net to support timestamp fields for compatibility with java and c# time fields. The proto generated similar to below. Proto Class import "bcl.proto"; package A

message X{

***
bcl.DateTime EditorialStatusModifiedDTim=6;
bcl.DateTime ModifiedDTim=7;

}

While deserializing at run time, we get the below error: Note that this worked without using the bcl class but had issues with the timestamp classes

scala.ScalaReflectionException: is not a term at scala.reflect.api.Symbols$SymbolApi$class.asTerm(Symbols.scala:199) at scala.reflect.internal.Symbols$SymbolContextApiImpl.asTerm(Symbols.scala:84) at org.apache.spark.sql.catalyst.ScalaReflection$class.constructParams(ScalaReflection.scala:895) at org.apache.spark.sql.catalyst.ScalaReflection$.constructParams(ScalaReflection.scala:39) at org.apache.spark.sql.catalyst.ScalaReflection$class.getConstructorParameters(ScalaReflection.scala:875) at org.apache.spark.sql.catalyst.ScalaReflection$.getConstructorParameters(ScalaReflection.scala:39) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:773) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:715) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56) at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:824) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:714) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1$$anonfun$apply$8.apply(ScalaReflection.scala:776) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1$$anonfun$apply$8.apply(ScalaReflection.scala:775) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:775) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:715) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56) at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:824) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:714) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:728) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:715) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56) at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:824) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:714) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1$$anonfun$apply$8.apply(ScalaReflection.scala:776) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1$$anonfun$apply$8.apply(ScalaReflection.scala:775) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:775) at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:715) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56) at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:824) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:714) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:711) at org.apache.spark.sql.functions$.udf(functions.scala:3398) at Utils.UDF$.toProtoObject(UDF.scala:26)

PKHeb commented 4 years ago

below is the bcl proto for reference: // The types in here indicate how protobuf-net represents certain types when using protobuf-net specific // library features. Note that it is not required to use any of these types, and cross-platform code // should usually avoid them completely (ideally starting from a .proto schema)

// Some of these are ugly, sorry. The TimeSpan / DateTime dates here pre-date the introduction of Timestamp // and Duration, and the "well known" types should be preferred when possible. Guids are particularly // awkward - it turns out that there are multiple guid representations, and I accidentally used one that // I can only call... "crazy-endian". Just make sure you check the order!

// It should not be necessary to use bcl.proto from code that uses protobuf-net

syntax = "proto3";

//option csharp_namespace = "ProtoBuf.Bcl";

package bcl;

message TimeSpan { sint64 value = 1; // the size of the timespan (in units of the selected scale) TimeSpanScale scale = 2 ; // the scale of the timespan [default = DAYS] enum TimeSpanScale { DAYS = 0; HOURS = 1; MINUTES = 2; SECONDS = 3; MILLISECONDS = 4; TICKS = 5;

MINMAX = 15; // dubious

} }

message DateTime { sint64 value = 1; // the offset (in units of the selected scale) from 1970/01/01 TimeSpanScale scale = 2; // the scale of the timespan [default = DAYS] DateTimeKind kind = 3; // the kind of date/time being represented [default = UNSPECIFIED] enum TimeSpanScale { DAYS = 0; HOURS = 1; MINUTES = 2; SECONDS = 3; MILLISECONDS = 4; TICKS = 5;

MINMAX = 15; // dubious

} enum DateTimeKind {
// The time represented is not specified as either local time or Coordinated Universal Time (UTC). UNSPECIFIED = 0; // The time represented is UTC. UTC = 1; // The time represented is local time. LOCAL = 2; } }

message NetObjectProxy { int32 existingObjectKey = 1; // for a tracked object, the key of the first time this object was seen int32 newObjectKey = 2; // for a tracked object, a new key, the first time this object is seen int32 existingTypeKey = 3; // for dynamic typing, the key of the first time this type was seen int32 newTypeKey = 4; // for dynamic typing, a new key, the first time this type is seen string typeName = 8; // for dynamic typing, the name of the type (only present along with newTypeKey) bytes payload = 10; // the new string/value (only present along with newObjectKey) }

message Guid { fixed64 lo = 1; // the first 8 bytes of the guid (note:crazy-endian) fixed64 hi = 2; // the second 8 bytes of the guid (note:crazy-endian) }

message Decimal { uint64 lo = 1; // the first 64 bits of the underlying value uint32 hi = 2; // the last 32 bis of the underlying value uint32 signScale = 3; // the number of decimal digits (bits 1-16), and the sign (bit 0) }

thesamet commented 4 years ago

Spark doesn't know how to deal with ScalaPB's enum. To get around it, you need to use sparksql-scalapb as described here: https://scalapb.github.io/sparksql.html . The key is to import scalapb.spark.implicits._ instead of spark.implicits._.

PKHeb commented 4 years ago

thanks, when followed the steps the runtime throws the below error? does that mean the shading has not happened correctly?

User class threw exception: java.lang.NoClassDefFoundError: shadeproto/ByteString at scalapb.spark.ByteStringTypedEncoder.(TypedEncoders.scala:93) at scalapb.spark.Implicits$class.$init$(TypedEncoders.scala:120) at scalapb.spark.Implicits$.(TypedEncoders.scala:128) at scalapb.spark.Implicits$.(TypedEncoders.scala) at Scripts.SampleScripts.KafkaToConsole$.main(KafkaToConsole.scala:39) at Scripts.SampleScripts.KafkaToConsole.main(KafkaToConsole.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:739) Caused by: java.lang.ClassNotFoundException: shadeproto.ByteString at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 11 more

thesamet commented 4 years ago

Yes, it looks like an issue related to shading. Maybe you can share a minimal example, including how you build it, and how you deploy it.

PKHeb commented 4 years ago

Apologies for the delay. This worked for me after disabling the shading rule from build.sbt. Seems like the protobuf runtime with the latest spark is compatible with scalapb atleast for my cases.

thesamet commented 4 years ago

Great! closing this issue then. Feel free to reach out if anything else comes up.