salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
https://transmogrif.ai
BSD 3-Clause "New" or "Revised" License
2.24k stars 393 forks source link

Convert lambda functions into concrete classes to allow compatibility with Scala 2.11/2.12 #357

Closed tovbinm closed 5 years ago

tovbinm commented 5 years ago

Related issues Follows up on PR #274 and lands the foundation to support Scala 2.12 #332

Scala 2.12 generates different class names for lambdas than Scala 2.1. Example:

object A {
  class Foo extends Function1[Int, Int] { def apply(v: Int): Int = v + 1 }
  def foo: Int => Int = x => x + 1
}
println(A.f.getClass.getName)
println(A.foo.getClass.getName)

Scala 2.11:
    A$Foo
    A$$anonfun$foo$1

Scala 2.12:
    A$Foo
    A$$$Lambda$13458/243505737

To maintain forward compatibility of models (so one won’t need to retrain models and would be able to load them in Scala 2.11/2.12) we would like to convert all the lambdas into concrete classes.

Describe the proposed solution Changing all the lambdas with concrete classes.

Describe alternatives you've considered NA

codecov[bot] commented 5 years ago

Codecov Report

Merging #357 into master will decrease coverage by 3.39%. The diff coverage is 87.5%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #357     +/-   ##
=========================================
- Coverage   86.82%   83.42%   -3.4%     
=========================================
  Files         336      336             
  Lines       10865    10865             
  Branches      568      587     +19     
=========================================
- Hits         9433     9064    -369     
- Misses       1432     1801    +369
Impacted Files Coverage Δ
...m/salesforce/op/aggregators/ExtendedMultiset.scala 75% <0%> (ø) :arrow_up:
...op/evaluators/OpMultiClassificationEvaluator.scala 94.73% <100%> (ø) :arrow_up:
...a/com/salesforce/op/filters/PreparedFeatures.scala 80.48% <100%> (ø) :arrow_up:
...stages/impl/feature/TimePeriodMapTransformer.scala 100% <100%> (ø) :arrow_up:
...rce/op/stages/impl/feature/PhoneNumberParser.scala 100% <100%> (ø) :arrow_up:
...alesforce/op/cli/gen/templates/SimpleProject.scala 0% <0%> (-100%) :arrow_down:
.../scala/com/salesforce/op/cli/gen/ProblemKind.scala 0% <0%> (-100%) :arrow_down:
...cala/com/salesforce/op/cli/gen/FileInProject.scala 0% <0%> (-100%) :arrow_down:
...in/scala/com/salesforce/op/cli/CommandParser.scala 0% <0%> (-98.12%) :arrow_down:
...cala/com/salesforce/op/cli/gen/ProblemSchema.scala 0% <0%> (-96.56%) :arrow_down:
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6707f2b...0cf823a. Read the comment docs.

wsuchy commented 5 years ago

Imagine we move all lambdas to a companion object like this:

class RichMapFeature {
....
(f.map[RealNN](RichMapFeatureLambdas.predictionToReal),
....
}

object RichMapFeatureLambdas{
val predictionToReal = (p: Prediction): RealNN => p.prediction.toRealNN
}

Still whenever try to inspect RichMapFeatureLambdas.predictionToReal we get that random name under scala 2.12 and more deterministic value under scala 2.11, still not good.

But, given that we a priori know where are stored all lambdas for that class we could easily inspect our RichMapFeatureLambdas object and create a reverse lookup mapping all generated numbers to the property holding the actual function, e.g Lambda1$$$Lambda$4624/831965909@6259b7e5 -> RichMapFeatureLambdas.predictionToReal

Then during the serialization we would simply use our map to replace compiler generated names with class.property references. Because it would use run-time reflections the mapping would be always correct.

Finally the solution would work for both scala 2.11/2.12 and would be future proof with our macro. @tovbinm

tovbinm commented 5 years ago

@wsuchy Sorry, I dont understand what exactly are you suggesting? what's wrong with the solution I propose in this PR?

wsuchy commented 5 years ago

@tovbinm Eventually we don't want our users to have to create manually their lambda functions as concrete classes and we know there is a solution (like the one I described). Although your changes apply only to the internal code we will end up having two ways of dealing with lambdas. Also, modifying the code where you have to write functions manually isn't the greatest experience. This is the reason I proposed to use macro for generating these instead.

tovbinm commented 5 years ago

I think our macros would offer exactly the generation of concrete classes as proposed in this PR (since concrete classes are the most straightforward way to handle lambda functions in all scenarios possible, unless Scala decides to change their FunctionX interface of course, which is unlikely).

wsuchy commented 5 years ago

Unfortunately you don't have types present at macro evaluation time so you can't populate type params here: Function1[???,???]. Unless we don't want to write a compiler plugin of course.

tovbinm commented 5 years ago

Anonymous lambdas are much more difficult to debug and comprehend that simple classes, therefore I tend to assume that if simple macros spent work then we will have to write a compiler plugin if we decide to do so.

tovbinm commented 5 years ago
  1. I followed all the changes that were done in #274 and changed chose to concrete classes.
  2. Yes all lambdas and extract functions have to be declared as concrete classes. These classes do not have to be declared inside objects though, these can be standalone classes that live outside objects (I just did it fo convenience)