twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.5k stars 706 forks source link

scalding macro annotation #1658

Open johnynek opened 7 years ago

johnynek commented 7 years ago

This is a vague idea, but we could make a scalding macro annotation, such as:

@scalding
class MyJob(foo: Args) extends Job(args) {

}

Which could do some standard transformation of the code so that a lot of common gotchas are fixed. For instance, we could move any case classes defined inside out of the Job. We could make all val into lazy val which will almost certainly not hurt perf, but may improve serializability. We could move all the constructor code into a method:

class Foo(args: Args) extends Job(args) {
  baz
}
// becomes
class Foo(args: Args) extends Job(args) {
  private[this] def init() = {
    baz
  }
  init()
}

so that we avoid making member fields with any local vals (which might improve serialization). (we would probably have to handle override separately, but that seems tractable since people almost never, and should probably never override anything but config and next in a Job.

There are probably other ideas.

thoughts?

cc @sritchie @ianoc @piyushnarang @isnotinvain

johnynek commented 7 years ago

We could also look through all the types at compile time and build a list of tokens statically.

We could also have different arguments to the macro that might turn on compile time requirements for OrderedSerialization or similar.

ianoc commented 7 years ago

It seems like an intresting project, probably a decent bit of work. But a good experiment. Quantifying the win from the perf aspect seems tough enough too.

(personally since I use execution ~exclusively, it doesn't really effect me -- could be a decent win for twitter though if it could mostly drop in to apply across jobs)

piyushnarang commented 7 years ago

This sounds pretty cool. Some of the transformations you're proposing are useful and we have occasionally run into them. Our usage of Job vs Execution is split (not sure off the top of my head what the breakdown is) but we do have a fair number of users on Execution now.