twitter / summingbird

Streaming MapReduce with Scalding and Storm
https://twitter.com/summingbird
Apache License 2.0
2.14k stars 267 forks source link

Support Scala 2.12 #724

Closed ttim closed 7 years ago

ttim commented 7 years ago

This PR bumps dependencies versions to most recent ones with Scala 2.12 support. Algebird fixes based on #715.

This PR also removes support for Scala 2.10, because scalding doesn't support it.

ttim commented 7 years ago

This PR will be mergeable as soon as we release scalding 0.17.1 with fix for KryoHadoop https://github.com/twitter/scalding/pull/1685.

ttim commented 7 years ago

Now, when scalding release is out we can merge this 2.12 upgrade for SB. What do you think @pankajroark @piyushnarang @johnynek ?

ttim commented 7 years ago

Update: false alarm, tests are failing, digging into this now.

johnynek commented 7 years ago

looks like a legit failure:

https://travis-ci.org/twitter/summingbird/jobs/252265982#L3657

Caused by: com.esotericsoftware.kryo.KryoException: java.lang.RuntimeException: Could not serialize lambda

Serialization trace:

fn (com.twitter.scalding.typed.FlatMappedFn)

    at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)

    at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)

    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)

    at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:25)

    at com.twitter.chill.SomeSerializer.read(SomeSerializer.scala:19)

    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)

    at com.twitter.chill.SerDeState.readClassAndObject(SerDeState.java:61)

    at com.twitter.chill.KryoPool.fromBytes(KryoPool.java:94)

    at com.twitter.chill.Externalizer.fromBytes(Externalizer.scala:145)

    at com.twitter.chill.Externalizer.maybeReadJavaKryo(Externalizer.scala:158)

    at com.twitter.chill.Externalizer.readExternal(Externalizer.scala:148)

    at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1840)

This could be that we don't have the right chill version? It could be this exposes a bug in chill.

ttim commented 7 years ago

@johnynek yes, it is legit =( Unfortunately this time it comes from scalacheck and it's same issue as we saw in chill, yes - too many lambdas in one class. I tried to workaround this by creating this lambdas in summingbird code but there are couple of things which makes everything more complicated - they declare everything as private[scalacheck] (thanks not in general) and most of the classes are sealed. At some point I decided to stop and now I'm going to make a PR to scalacheck instead. If they do release quickly we can use new version otherwise I will continue to work on workaround.

non commented 7 years ago

@ttim I'm happy to help get this fixed, either in this repo or ScalaCheck? What were you planning to do to work around this? Restructure the ScalaCheck code a bit? Create your own custom Gen instances?

ttim commented 7 years ago

@non that's awesome! If you can also help with releasing new version it would be twice awesome =)

In short term for SB I thought just to add code like this:

object ArbitraryWorkaround {
  implicit val f1: Arbitrary[Int => Int] = Arbitrary(Gen.const(x => x * 2))
  implicit val f2: Arbitrary[Int => List[(Int, Int)]] = Arbitrary(Gen.const(x => List((x, x * 3))))
  implicit val f3: Arbitrary[Int => Option[Int]] = Arbitrary(Gen.const(x => {
    if (x % 2 == 0) None else Some(x * 4)
  }))
  implicit val f4: Arbitrary[Int => List[Int]] = Arbitrary(Gen.const(x => List(x * 5)))
  implicit val f5: Arbitrary[((Int, (Int, Option[Int]))) => List[(Int, Int)]] = Arbitrary(Gen.const {
    case (x, (y, optZ)) => List((x, y), (x, optZ.getOrElse(42)))
  })
  implicit val f6: Arbitrary[((Int, Int)) => List[(Int, Int)]] = Arbitrary(Gen.const(x => List(x, x)))
}

But if we can do release reasonably soon I will do PR to scalacheck as soon as possible (today/tomorrow). Plan is to add a test on that + divide ArbitraryArities and GenArities into two files. Wdyt?

non commented 7 years ago

@ttim My suggestion is that if the above work-around solves the issue for Summingbird we start there.

I'm happy to try to expedite the suggested change to ScalaCheck, but unfortunately the arities files are traits right now, so restructuring them will break binary compatibility. This doesn't mean we can't do it, but might make a quick release more difficult.

non commented 7 years ago

(Also, if you'd prefer help I can make a branch off of this one and try the approach out myself later tonight.)

ttim commented 7 years ago

@non let's do this then. I'll merge my workaround. Meanwhile: this is a scala issue - https://issues.scala-lang.org/browse/SI-10232. Here is repro I used:

import org.scalacheck.Arbitrary._
val f = arbitrary[Int => Int]
val outputStream = new java.io.ByteArrayOutputStream()
val objectOutputStream = new java.io.ObjectOutputStream(outputStream)
objectOutputStream.writeObject(f)
objectOutputStream.close
val inputStream = new java.io.ByteArrayInputStream(outputStream.toByteArray)
val objectInputStream = new java.io.ObjectInputStream(inputStream)
objectInputStream.readObject()

Causes:

Caused by: java.lang.BootstrapMethodError: too many bootstrap method arguments
  at java.lang.invoke.CallSite.makeSite(CallSite.java:320)
  at java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(MethodHandleNatives.java:307)
  at java.lang.invoke.MethodHandleNatives.linkCallSite(MethodHandleNatives.java:297)
  at org.scalacheck.GenArities.$deserializeLambda$(GenArities.scala)
ttim commented 7 years ago

@non created a bug in scalacheck repo https://github.com/rickynils/scalacheck/issues/342, thank you very much for looking on this!

non commented 7 years ago

One other thing to note: it seems like the bug suggests that using -Ydelamdafy:inline is a work-around, so it's possible we could include that in the scalac arguments used for testing to avoid the Scala bug.

(Thanks for giving me a repro -- I'll start seeing what I have to do to fix this issue in ScalaCheck.)

ttim commented 7 years ago

@non I'm afraid -Ydelambdafy:inline should be used at the scalacheck compile time, so it's not possible to use as a workaround =(

travisbrown-stripe commented 7 years ago

I've got a fix on the ScalaCheck side that should let us publish an 0.13.6 release there that will resolve the issue without the workaround here.

ttim commented 7 years ago

Scalacheck release isn't out therefore I think to merge this PR now, and I created issue to remove it later as soon as scalacheck will be released - https://github.com/twitter/summingbird/issues/741. Wdyt @johnynek @pankajroark @piyushnarang ?

travisbrown-stripe commented 7 years ago

I think merging this now and removing the workaround later sounds reasonable, but I think @non is planning to publish the ScalaCheck release either this weekend or next week.

codecov-io commented 7 years ago

Codecov Report

Merging #724 into develop will increase coverage by 0.04%. The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #724      +/-   ##
===========================================
+ Coverage    72.21%   72.26%   +0.04%     
===========================================
  Files          153      154       +1     
  Lines         3736     3742       +6     
  Branches       204      209       +5     
===========================================
+ Hits          2698     2704       +6     
  Misses        1038     1038
Impacted Files Coverage Δ
.../twitter/summingbird/scalding/VersionedState.scala 0% <ø> (ø) :arrow_up:
...cala/com/twitter/summingbird/batch/Timestamp.scala 65.71% <100%> (ø)
...cala/com/twitter/summingbird/example/Storage.scala 77.77% <100%> (+1.3%) :arrow_up:
.../com/twitter/summingbird/ArbitraryWorkaround.scala 100% <100%> (ø)
...ain/scala/com/twitter/summingbird/TestGraphs.scala 86.62% <0%> (-1.92%) :arrow_down:
...ala/com/twitter/summingbird/scalding/Service.scala 76.92% <0%> (-1.54%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update fecdfa3...b860fb9. Read the comment docs.

ttim commented 7 years ago

@travisbrown-stripe I prefer to keep it merged and do not put any pressure on scalacheck =) I'll create removal PR immediately after merge thou.