Closed thomaschow closed 5 years ago
Thanks for the PR!
There is a test that fails that looks related to this: https://travis-ci.org/twitter/scalding/jobs/428467866#L9274
It may be that the test needs to change, but can you take a look?
sure, @johnynek do you have any thoughts on whether this is the correct approach? Why have we ported over the ScroogeStructConverter at all?
This seems fine. I think the original PR updates the test to add the UTF annotation.
I think we should do that here and we should be okay.
@johnynek I'm getting some incompatibilities when generating thrift, something like:
[error] /Users/tchow/personal/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_scala/test/TestUnion.scala:226: overriding method containedValue in trait ThriftUnion of type ()SecondMap.this.ContainedType; [error] method containedValue has incompatible type [error] def containedValue(): SecondMapAlias = second_map
This is unrelated to my changes, looks like just need to figure out the correct library versions. Can you advise on how to resolve this?
Can you find the minimal version of scrooge that can pass the tests and work here?
also @ttim can you give any comments on this bug and if twitter is still using this exact code (or do you have a copy of it internally)?
@thomaschow fix for scalding-parquet
module, will dig more tomorrow.
diff --git a/build.sbt b/build.sbt
index cd5d8cb..34881af 100644
--- a/build.sbt
+++ b/build.sbt
@@ -407,7 +407,7 @@ lazy val scaldingParquet = module("parquet").settings(
exclude("com.twitter.elephantbird", "elephant-bird-pig")
exclude("com.twitter.elephantbird", "elephant-bird-core"),
"org.scala-lang" % "scala-compiler" % scalaVersion.value,
- "org.apache.thrift" % "libthrift" % "0.7.0",
+ "org.apache.thrift" % "libthrift" % thriftVersion,
"org.slf4j" % "slf4j-api" % slf4jVersion,
"org.apache.hadoop" % "hadoop-client" % hadoopVersion % "provided",
"org.scala-lang" % "scala-reflect" % scalaVersion.value,
diff --git a/scalding-parquet/src/test/scala/com/twitter/scalding/parquet/ParquetSourcesTests.scala b/scalding-parquet/src/test/scala/com/twitter/scalding/parquet/ParquetSourcesTests.scala
index 82689a3..db74f2d 100644
--- a/scalding-parquet/src/test/scala/com/twitter/scalding/parquet/ParquetSourcesTests.scala
+++ b/scalding-parquet/src/test/scala/com/twitter/scalding/parquet/ParquetSourcesTests.scala
@@ -169,7 +169,7 @@ class MockTBase extends TBase[MockTBase, TFieldIdEnum] {
override def isSet(p1: TFieldIdEnum): Boolean = false
override def getFieldValue(p1: TFieldIdEnum): AnyRef = null
override def setFieldValue(p1: TFieldIdEnum, p2: scala.Any): Unit = ()
- override def deepCopy(): TBase[MockTBase, TFieldIdEnum] = null
+ override def deepCopy(): MockTBase = null
override def clear(): Unit = ()
override def compareTo(o: MockTBase): Int = 0
}
diff --git a/project/plugins.sbt b/project/plugins.sbt
index 5131951..4c535e9 100644
--- a/project/plugins.sbt
+++ b/project/plugins.sbt
@@ -11,7 +11,7 @@ addSbtPlugin("com.eed3si9n" % "sbt-unidoc" % "0.3.3")
addSbtPlugin("com.fortysevendeg" % "sbt-microsites" % "0.3.3")
addSbtPlugin("com.github.gseitz" % "sbt-release" % "1.0.0")
addSbtPlugin("com.jsuereth" % "sbt-pgp" % "1.0.0")
-addSbtPlugin("com.twitter" %% "scrooge-sbt-plugin" % "4.14.0")
+addSbtPlugin("com.twitter" %% "scrooge-sbt-plugin" % "18.9.0")
addSbtPlugin("com.typesafe" % "sbt-mima-plugin" % "0.1.14")
addSbtPlugin("com.typesafe.sbt" % "sbt-ghpages" % "0.5.4")
addSbtPlugin("com.typesafe.sbt" % "sbt-git" % "0.6.2")
so, also @thomaschow you need upgrade scrooge pluing.
looks like it doesn't compile, does this compile locally for you @tchow-stripe ?
[0m[[33mwarn[0m] [0m1248 warnings found[0m
[0m[[33mwarn[0m] [0mbootstrap class path not set in conjunction with -source 1.6[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/UnionV2.java:29: com.twitter.scalding.parquet.scrooge.thrift_java.test.UnionV2 is not abstract and does not override abstract method writeValue(org.apache.thrift.protocol.TProtocol) in org.apache.thrift.TUnion[0m
[0m[[31merror[0m] [0mpublic class UnionV2 extends TUnion<UnionV2, UnionV2._Fields> {[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/UnionV2.java:227: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/UnionV2.java:274: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/UnionV2.java:297: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/UnionV2.java:302: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/TestUnion.java:29: com.twitter.scalding.parquet.scrooge.thrift_java.test.TestUnion is not abstract and does not override abstract method writeValue(org.apache.thrift.protocol.TProtocol) in org.apache.thrift.TUnion[0m
[0m[[31merror[0m] [0mpublic class TestUnion extends TUnion<TestUnion, TestUnion._Fields> {[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/TestUnion.java:207: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/TestUnion.java:243: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/TestUnion.java:261: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/TestUnion.java:266: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/UnionOfStructs.java:29: com.twitter.scalding.parquet.scrooge.thrift_java.test.compat.UnionOfStructs is not abstract and does not override abstract method writeValue(org.apache.thrift.protocol.TProtocol) in org.apache.thrift.TUnion[0m
[0m[[31merror[0m] [0mpublic class UnionOfStructs extends TUnion<UnionOfStructs, UnionOfStructs._Fields> {[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/UnionOfStructs.java:227: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/UnionOfStructs.java:274: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/UnionOfStructs.java:297: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/UnionOfStructs.java:302: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/NestedNestedUnion.java:29: com.twitter.scalding.parquet.scrooge.thrift_java.test.compat.NestedNestedUnion is not abstract and does not override abstract method writeValue(org.apache.thrift.protocol.TProtocol) in org.apache.thrift.TUnion[0m
[0m[[31merror[0m] [0mpublic class NestedNestedUnion extends TUnion<NestedNestedUnion, NestedNestedUnion._Fields> {[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/NestedNestedUnion.java:207: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/NestedNestedUnion.java:243: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/NestedNestedUnion.java:261: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/NestedNestedUnion.java:266: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/NestedUnion.java:29: com.twitter.scalding.parquet.scrooge.thrift_java.test.compat.NestedUnion is not abstract and does not override abstract method writeValue(org.apache.thrift.protocol.TProtocol) in org.apache.thrift.TUnion[0m
[0m[[31merror[0m] [0mpublic class NestedUnion extends TUnion<NestedUnion, NestedUnion._Fields> {[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/NestedUnion.java:227: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/NestedUnion.java:274: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/NestedUnion.java:297: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/NestedUnion.java:302: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/UnionV2.java:29: com.twitter.scalding.parquet.scrooge.thrift_java.test.compat.UnionV2 is not abstract and does not override abstract method writeValue(org.apache.thrift.protocol.TProtocol) in org.apache.thrift.TUnion[0m
[0m[[31merror[0m] [0mpublic class UnionV2 extends TUnion<UnionV2, UnionV2._Fields> {[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/UnionV2.java:227: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/UnionV2.java:274: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/UnionV2.java:297: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/UnionV2.java:302: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/UnionStructUnion.java:29: com.twitter.scalding.parquet.scrooge.thrift_java.test.compat.UnionStructUnion is not abstract and does not override abstract method writeValue(org.apache.thrift.protocol.TProtocol) in org.apache.thrift.TUnion[0m
[0m[[31merror[0m] [0mpublic class UnionStructUnion extends TUnion<UnionStructUnion, UnionStructUnion._Fields> {[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/UnionStructUnion.java:227: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/UnionStructUnion.java:274: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/UnionStructUnion.java:297: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/UnionStructUnion.java:302: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/UnionV1.java:29: com.twitter.scalding.parquet.scrooge.thrift_java.test.compat.UnionV1 is not abstract and does not override abstract method writeValue(org.apache.thrift.protocol.TProtocol) in org.apache.thrift.TUnion[0m
[0m[[31merror[0m] [0mpublic class UnionV1 extends TUnion<UnionV1, UnionV1._Fields> {[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/UnionV1.java:207: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/UnionV1.java:243: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/UnionV1.java:261: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m/home/travis/build/twitter/scalding/scalding-parquet-scrooge-fixtures/target/scala-2.11/src_managed/test/thrift/com/twitter/scalding/parquet/scrooge/thrift_java/test/compat/UnionV1.java:266: method does not override or implement a method from a supertype[0m
[0m[[31merror[0m] [0m @java.lang.Override[0m
[0m[[31merror[0m] [0m(scalding-parquet-scrooge-fixtures/test:[31mcompileIncremental[0m) javac returned nonzero exit code[0m
[0m[[31merror[0m] [0mTotal time: 95 s, completed Nov 5, 2018 6:58:01 PM[0m``
cc @thomaschow
@thomaschow do you want to work on this before we publish a new version which is about to happen?
sure, let me try to make some progress on it today and I"ll have an update by tomorrow.
Thoams On Tue, Dec 4, 2018 at 12:34 PM P. Oscar Boykin notifications@github.com wrote:
@thomaschow https://github.com/thomaschow do you want to work on this before we publish a new version which is about to happen?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/twitter/scalding/pull/1872#issuecomment-444247954, or mute the thread https://github.com/notifications/unsubscribe-auth/AYI1j43sUnsngOF7tUZ7jiSih1b8ry89ks5u1txWgaJpZM4WonZL .
On Tue, Dec 4, 2018 at 12:34 PM P. Oscar Boykin notifications@github.com wrote:
@thomaschow https://github.com/thomaschow do you want to work on this before we publish a new version which is about to happen?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/twitter/scalding/pull/1872#issuecomment-444247954, or mute the thread https://github.com/notifications/unsubscribe-auth/AYI1j43sUnsngOF7tUZ7jiSih1b8ry89ks5u1txWgaJpZM4WonZL .
@johnynek is there a way to re-run a specific subuild of the ci?
I'd like some help with understanding the code gen. The problem here is that whatever's doing the code gen (I believe scrooge) doesn't support binary types. If someone could advise on how to debug I can adjust the unit tests appropriately.
will time out by eod cc @dieu @ttim
@thomaschow scrooge support binary, I'm pretty sure. ByteBuffer is the type it generates, I think.
In any case, what specifically is the question? The CI is green.
the StringAndBinary
that gets generated doesn't have the FieldMetadata set correctly (ie binary is set to false). I've bypassed scrooge and instead used the inline ScroogeStructConverter.
Do you have something that fails or shows the issue? its possible the field metadata is broken, afaik the biggest user of it was parquet interactions... at least 3-4 years ago i believe. But at the same time I'd like to see something failing if we could? The scrooge struct converter i'm sure uses the same scrooge fields no?
yeah so the unit test change I made is a hack -- it's effectively a noop comparision. To fix it properly, the StringAndBinary class needs to be autogen'ed properly with the binary boolean set. If I revert the unit test this will break for that reason, and it doesn't seem addressable in this repo
which is broken? I don't think we can commit it with the test actively broken in this way -- commenting them out would be better. This makes them sort of look like real tests but they do nothing.
if its just one type is broken can we just disable that test?
if its just one type is broken can we just disable that test?
I'm happy to do this and leave a comment.
I'm sorry I missed this.
If we can't reproduce this in a test, why do we think we fixed the bug? It seems credible, but if we don't have a test that fails without the patch, I'm skeptical we have made any improvements.
but if we don't have a test that fails without the patch, I'm skeptical we have made any improvements.
From my understanding of the code, the correct fix and testing is elsewhere (possibly in scrooge).
So i took a little look at this -- with these in future, always good to paste the actual error into the PR for folks, the error is:
[error] Test com.twitter.scalding.parquet.scrooge.ScroogeStructConverterTest.testBinary failed: expected:<message ParquetSchema {
[error] required binary s (UTF8) = 1;
[error] required binary b (UTF8) = 2;
[error] }
[error] > but was:<message ParquetSchema {
[error] required binary s (UTF8) = 1;
[error] required binary b = 2;
[error] }
Concretely:
System.out.println("structFromThriftSchemaConverter");
System.out.println(toParquetSchema(structFromThriftSchemaConverter));
System.out.println("structFromScroogeSchemaConverter");
System.out.println(toParquetSchema(structFromScroogeSchemaConverter));
Yields:
structFromThriftSchemaConverter
message ParquetSchema {
required binary s (UTF8) = 1;
required binary b (UTF8) = 2;
}
structFromScroogeSchemaConverter
message ParquetSchema {
required binary s (UTF8) = 1;
required binary b = 2;
}
So from my reading of that, there is nothing wrong with the scrooge code in play here. It looks like the code added is having the expected effect (though we need some tests here to cover the change). The change your migrating looks like it should be in the version of parquet we are using, but it appears at least to be broken/doesn't work.
@ianoc sry thanks for the tips on etiquette here, and appreciate the digging. What would you recommend here? I've bumped parquet to the latest published version, and matched the libthrift version.
but the bug here is because we just inline the ScroogeStructConverter (which would otherwise be coming from parquet) and that has diverged.
i'm not really following that -- based on the above printouts it looks like the TBase/JavaThrift converter paths is producing the wrong result based on what your aiming to get out of it?
As a first step i'd put this test on hold and write some direct tests for the behavior you expect to change here, since right now it looks like the TBase/ThriftStruct implementations don't agree, but some unit tests for each should nail that down
The current unit test imo already reveals the issue, hence why it's failing. The issue is in the difference between the StringAndBinary java generated classes of scalding and the main parquet project. Here's what's different (and why it's failing in scalding):
parquet-mr:
tmpMap.put(_Fields.B, new org.apache.thrift.meta_data.FieldMetaData("b", org.apache.thrift.TFieldRequirementType.REQUIRED,
new org.apache.thrift.meta_data.FieldValueMetaData(org.apache.thrift.protocol.TType.STRING , true)));
scalding:
tmpMap.put(_Fields.B, new FieldMetaData("b", TFieldRequirementType.REQUIRED,
registerBinaryFieldValueMetaData(new FieldValueMetaData(TType.STRING), tmpSet)));
What's generated in scalding doesn't set the FieldValueMetaData correctly even though I've matched the libthrift versions to 0.9.3.
FYI this same unit test is in parquet-mr: https://github.com/apache/parquet-mr/blob/master/parquet-scrooge/src/test/java/org/apache/parquet/scrooge/ScroogeStructConverterTest.java#L81
I ran the following commands to generate these classes:
parquet-mr:
wget -nv http://archive.apache.org/dist/thrift/0.9.3/thrift-0.9.3.tar.gz
tar xzf thrift-0.9.3.tar.gz
cd thrift-0.9.3
chmod +x ./configure
./configure --disable-gen-erl --disable-gen-hs --without-ruby --without-haskell --without-erlang --without-php --without-nodejs
LC_ALL=C mvn -pl parquet-thrift -am clean compile
scalding:
./sbt -java-home /usr
scalding-parquet-scrooge/test
@thomaschow a direct tests here would be that we seem to be mentioning java code gen vs scala code gen. Which feels to imply our testing isn't localized that we are testing our change directly. We are testing comparsions between java and scala. Which confuses the issue. Without your change do all the tests pass clean with the upgrade libraries? If they do then we should add a test using the scala/scrooge libraries + parquet (no java thrift) only and make a test that fails with the StringAndBinary data. Once that test fails we can then make it pass. looking to see if there is a bug in our code for the java handling/generation in scrooge we can then think about separately -- ultimately we might be ok to ignore that if we can test the changes we need here directly and show its a code gen issue for java. But if we disable this test we just aren't testing your change at all
@ianoc thanks for the advice -- so the tests pass with just the library version bumps and without my change, so I've added a test that bypasses java vs scala and just compares at the parquet level. Let me know if this is sufficient.
@ianoc any feedback on the current test?
without my change, it fails with:
[error] Test com.twitter.scalding.parquet.scrooge.ScroogeStructConverterTest.testScroogeBinary failed: expected:<message ParquetSchema {
[error] required binary s (UTF8) = 1;
[error] required binary b = 2;
[error] }
[error] > but was:<message ParquetSchema {
[error] required binary s (UTF8) = 1;
[error] required binary b (UTF8) = 2;
[error] }
[error] >, took 0.002 sec
[info] ScalaTest
[info] Run completed in 35 seconds, 390 milliseconds.
[info] Total number of tests run: 39
[info] Suites: completed 4, aborted 0
[info] Tests: succeeded 39, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[error] Failed: Total 70, Failed 1, Errors 0, Passed 69
[error] Failed tests:
[error] com.twitter.scalding.parquet.scrooge.ScroogeStructConverterTest
[error] (scalding-parquet-scrooge/test:test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 45 s, completed Dec 11, 2018 10:51:33 AM
cc @johnynek I'd like to merge this in so we can pick it up with the next scalding version -- does this test look sufficient to you?
Thanks for showing the test that fails without the changes!
This ports over part of https://github.com/apache/parquet-mr/compare/master...nandorKollar:PARQUET-1191 where a check is added to ScroogeStructConverter when mapping to Thrift String types whether the field is a string or binary (byte array). This allows the parquet-thrift ThriftSchemaConverter to generate the correct parquet schema (distinguishing between Strings, represented as UTF-8 annotated BINARY https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#string, or just plain BINARY) https://github.com/apache/parquet-mr/blob/master/parquet-thrift/src/main/java/org/apache/parquet/thrift/ThriftSchemaConvertVisitor.java#L332 NOTE: this requires a version bump of parquet
r? @ianoc @johnynek