Caching for typed implicit defs

fommil commented 10 years ago

Implicit defs are currently recomputed at the call site, which can result in a lot of needless object creation and reducing the performance of users.

It would be absolutely fantastic if implicit defs could be cached.

This impacts me the most when using implicit-heavy serialisation libraries. A partial workaround is to create a named val for each type that I am interested in, but it is not possible to use this (horrible boilerplate-ridden hack) when there are cyclic dependencies.

propensive commented 10 years ago

If they're defined as defs, should it be considered correct not to re-evaluate them every time? I think it would be useful, but I wouldn't rule out the possibility of wanting side-effects from implicit instantiation at each callsite.

See also #84. I often use defs just to avoid this issue -- despite the additional object creation -- so it would be nice not to have that overhead.

stanch commented 10 years ago

More generally, it would be nice to have the ability to cache any def provided it only takes implicit arguments — with the assumption that the respective implicits are always the same. For example:

cached def downloadStuff(implicit ec: ExecutionContext): Future[String] = ???

Once initialized, it should return the same future. Or:

implicit cached def bookReads(implicit val reads: Reads[Author]): Reads[Book] = ???

This can be done with a macro annotation, but — as with lazy — a compiler-internal implementation may be more memory-efficient. Or more controversial :)

stanch commented 10 years ago

The cached def above could also be lazy val. Anyway, the biggest problem with this idea is indeterminism: multiple call sites could specify different implicits, with only one of them “winning”.

paulp commented 10 years ago

There's a lot to be said for exposing a call site distinguisher at the language level, though I don't say it would be easy. Imagine something (conceptually) like a ThreadLocal - a CallSiteLocal.

Blaisorblade commented 10 years ago

If they're defined as defs, should it be considered correct not to re-evaluate them every time? I think it would be useful, but I wouldn't rule out the possibility of wanting side-effects from implicit instantiation at each callsite.

You could cache them if they were pure. In fact, doing optimizations based on purity analysis would make lots of idioms much more efficient. While Scalac has no automatic purity analyser, Lukas Rytz implemented an effect system for Scala, which allows annotating and checking purity (if all your libraries are annotated).

There's a lot to be said for exposing a call site distinguisher at the language level, though I don't say it would be easy. Imagine something (conceptually) like a ThreadLocal - a CallSiteLocal.

Ah, so you could cache the per-call-site results of implicit defs, even though the results can be different at different call sites. Is that how you'd use it? I think you can implement that using AspectJ (at least if you ignore performance and use a HashMap for doing the lookup — not sure how to avoid that overhead), but let's say that AspectJ (and AOP) has not proven to be worth its weight.

stanch commented 10 years ago

Would be interesting to think how this interacts with nested calls, where implicits are passed through:

def f(implicit x: Z) = g
cached def g(implicit x: Z) = ???

Does f become cached? Or is there a warning that it should be made cached explicitly?

stanch commented 10 years ago

I think you can implement that using AspectJ (at least if you ignore performance and use a HashMap for doing the lookup — not sure how to avoid that overhead), but let's say that AspectJ (and AOP) has not proven to be worth its weight.

What about macro-annotations? An annotation could probably decorate a cached def with a macro that tracks call-sites.

paulp commented 10 years ago

Is that how you'd use it?

I was also thinking there would be opportunities to exploit invokedynamic and/or to specialize hot paths. And maybe to do structural types right - cache the actual interface for each call site and invoke that directly.

Blaisorblade commented 10 years ago

I was also thinking there would be opportunities to exploit invokedynamic and/or to specialize hot paths. And maybe to do structural types right - cache the actual interface for each call site and invoke that directly.

Ah, right! Inline caching & friends do require per-call-site storage. For this sort of instrumentation, AOP turns out to be a more reasonable idea.

Would be interesting to think how this interacts with nested calls, where implicits are passed through:

@stanch: if you specify what's pure, instead of what's cached, the question becomes easier. You simply ask when caching is guaranteed to not change the result of the program, and the answer is simply "when all the implicits you call are pure". Two questions remain, but at least they don't affect "idealized" correctness:

when do you actually want to cache implicits — there's a tradeoff between time and space. This is analogous to the one with specialization — caching at megamorphic call sites is bad. In fact, since computers have limited resources (unlike e.g. Turing machines), this affects correctness when it brings to memory exhaustion.
when do you want to warn that you can't cache implicits. To minimize useless warnings, you need to guess (or investigate) how they're used in practice. Off the top of my head, an implicit which has a visible side effect sounds like looking for trouble, because you're asking the compiler to add side effects on its own — in which case I'd just warn provide an optional warning for any impure implicits. I think this would also satisfy @propensive's perspective, since the warning is optional. Am I missing something?

Hidden (or benign) side effects are a different matter, of course. An implicit conversion to, say, interned strings (or in general, hash-consed values) should be safe, correctness-wise, even though it mutates the interning cache. I remember that the implicit conversions between String and Name in Scalac were a matter of debate, but IIRC that's not because of the side effects, but because of their performance impact (or at least, I hope so).

I wrote before:

Lukas Rytz implemented an effect system for Scala, which allows annotating and checking purity (if all your libraries are annotated).

In fact, if your implicit just creates a typeclass instance, I don't think you need annotations on libraries to show that the implicit is pure — new is pure, full stop. @lrytz, am I wrong? If I'm right, this might suddenly be more practical (not necessarily easy) — if somebody can finish porting the effect checker https://github.com/lrytz/efftp to 2.11.

paulp commented 10 years ago

I remember that the implicit conversions between String and Name in Scalac were a matter of debate, but IIRC that's not because of the side effects, but because of their performance impact (or at least, I hope so).

It's neither side effects nor performance which are primarily at issue (performance is a minor one.) The problem is that scala is not actually a typesafe language in its standard presentation, so you can "add" Strings and Names at will and they will accommodate the mood of the moment. One minute "foo" == "foo" because they're both strings, the next minute they don't because one is a name, the next minute they're both names but one is a TermName and one is a TypeName. I can't count how many bugs have led back to this total breakdown of abstraction integrity. Plenty still to be found.

paulp commented 10 years ago

Here's a typical scenario. This requires the collaboration of multiple misfeatures.

scala> List(TermName("Foo")) indexOf "Foo"
res0: Int = -1

Blaisorblade commented 10 years ago

Aaah! I see, thanks. Luckily, this is not about side effects in implicits.

It's about Scala being "morally" incomplete (in the mathematical sense, which is related but different from the usual one) — "morally", "Foo" is (arguably) an element of List(TermName("Foo")). Eugenia Cheng claims that category theory (CT) is morally complete — that in CT “everything that ought to be true is true” (because you require enough diagrams to commute). And indeed, Reynolds' paper on implicit conversions applies category theory to the topic to restore morality.

paulp commented 10 years ago

I haven't looked into what you mean by "morally" yet, but if "Foo" is an element of that list then the prevailing morality excludes transitivity of equality. I am a big fan of the Reynolds paper.

Blaisorblade commented 10 years ago

I haven't looked into what you mean by "morally" yet, but if "Foo" is an element of that list then the prevailing morality excludes transitivity of equality.

Let me retract a bit: I don't get morality well enough to use it properly, though "everything that ought to be true is true" is part of it (that paper doesn't fully define mathematical morality to somebody who hasn't learned it already, though I appreciate the concept). Intransitive equality seems certainly immoral; that code compiling by inserting the "right" implicit conversion might be moral (if robust enough), and rejecting that code might also be moral. Multiple implicit conversions to the same type returning unequal results (above, subtyping from String to Any, vs String => TermName and then subtyping) is immoral. (I'm thinking of the scenario where TermName would be inserted by an implicit conversion because of type signatures in intermediate methods, so the issue is about writing "Foo" twice and getting different things, as if equality weren't reflexive).

(Apologies that we're OT in the thread).

stanch commented 10 years ago

If you specify what's pure, instead of what's cached, the question becomes easier.

@Blaisorblade Yes, sure. I was just continuing my thought along the lines of non-implicit cached defs, where useful impure examples exist:

case class Picture(url: URL) {
  def fetch(implicit ec: ExecutionContext): Future[File] = ???
}

Now, fetch could have been a lazy val, because we don’t want to run it twice, and the different execution contexts don’t matter (they do, but we assume that the same reasonable one is passed). Actually now that I think of it, that property does not necessarily correlate with the argument(s) being implicit. Perhaps it’s best to write an annotation library that would support @cache on defs, which have all arguments annotated with @doNotCare.

For the record, I solved a similar problem by moving Future[File] to the class arguments and writing a library to orchestrate generating data with all futures already in place.

paulp commented 10 years ago

that property does not necessarily correlate with the argument(s) being implicit.

You don't even want to imagine how much time is spent creating and passing around identity functions.

stanch commented 10 years ago

@paulp Could you elaborate a bit?

paulp commented 10 years ago

"identity functions should be macros"

stanch commented 10 years ago

@paulp That’s more or less what I thought, but you were right — I didn’t want to imagine this...

Blaisorblade commented 10 years ago

@stanch Sorry, I had missed your non-implicit scenario. But if possible, I'd say that belongs to a macro because you need quite some flexibility on the implementation.

I've thought a bit on how to make that

What about macro-annotations? An annotation could probably decorate a cached def with a macro that tracks call-sites.

I'm no expert on annotations and their limitations; but even without them, I imagine something like

def realDownloadStuff(implicit ec: ExecutionContext) = ???
def downloadStuff(implicit ec: ExecutionContext): Future[String] = macro cachedDownloadStuff
def cachedDownloadStuff(...) = cached('realDownloadStuff ...) /* I'm not happy with using a Symbol here. But would q"realDownloadStuff" work? */

I imagine cached could create the needed code at the call-site — e.g. a cache lookup with a fresh ID generated at macro-expansion time, and thus call-site specific. (I keep imagining a version of C's function-local static, but I can't convince myself it can be made into a good idea).

Admittedly, that's still a bit heavyweight. It sounds like macro annotations might generate the above, but they do have quite a few limitations.

stanch commented 10 years ago

@Blaisorblade By the way, there’s also https://github.com/kciesielski/macmemo, but I’m not sure how lightweight it really is, since it’s using Guava.

Blaisorblade commented 10 years ago

@stanch Ah cool! You can use the library without using Guava (with custom memo cache builders). But they seem to be using Guava's CacheBuilder, which is A Good Thing (TM) to delete old memoized entries (see StackOverflow discussion and the thread). It sounds like you wouldn't want that in the compiler — certainly I don't want that, because different choices are appropriate for different applications. But without Guava, it's hard to discard old entries robustly. And I'm afraid of having in the compiler a "simple" caching strategy that doesn't discard old entries.

It might be possible to integrate in the standard library an approach like that if the configuration API is general enough that you don't need different ones. But strategically, I think an organization like the one of the Haskell Platform (with a distribution of multiple libraries of different maintainers which have minimal coordination) might be better.

Is this library appropriate for the use cases in this issue? If so, I propose to close the issue. @fommil, what do you think?

fommil commented 10 years ago

my use case is specifically for implicit defs that take type parameters and lots of implicits, it doesn't appear to support that (I tried it and the method kept being called).

Blaisorblade commented 10 years ago

@fommil wrote:

I'd like to move to a position where all implicit defs can be memoized, without an annotation (an external whitelist/blacklist at the most, to allow excluding bad responses with mutable state). That's where I really want to go with this: I think it's a fundamentally limiting cost of the scala runtime.

I agree with the sentiment, but I am not convinced that Scala with such a black/whitelist is a good idea overall in the long term. If adding an optional effect system, or automatic purity analysis, is possible, I think that'd be a much better idea. Allowing for more effects than in Haskell is convenient, but it does make this and many other optimization much harder. You wouldn't even need to turn identity functions into macros!

fommil commented 10 years ago

btw, I am only talking about parameter-less implicit defs that take implicit parameters. I don't care about implicit conversions (implicit methods that take a single parameter): in fact, caching those would be pretty insane!

lihaoyi commented 9 years ago

I've brought this up before

https://groups.google.com/forum/#!topic/scala-language/-RBNJpM5GIE

Apart from linking to previous discussion, this brings to the table benchmarks! I'm looking at a 5-10% perf hit on Scala-JVM and 25-30% perf hit on Scala.js from re-computed implicits. The data structures being created are all completely stateless, and are just stubs really to stick logic in their methods, so this repeated re-initialization is all a complete waste. Some of the implicits are macro-materializers, but in the end the distinction is pretty immaterial, since the non-macro List[T]/Map[K,V]/etc. implicits pay the same cost.

Blaisorblade commented 9 years ago

I'm a bit in a hurry, but I've run into what seems an excellent design for specifying what should happen — in particular, the answer to this question:

If they're defined as defs, should it be considered correct not to re-evaluate them every time? I think it would be useful, but I wouldn't rule out the possibility of wanting side-effects from implicit instantiation at each callsite.

I talked about pure vs impure implicits. Prof. Derek Dreyer highlighted a very elegant design (an extension of Modular Type Classes), also based on caring about this distinction — in fact, he came clearly first, and his design is more worked out. Unfortunately, his presentation assumes familiarity with both ML modules and Modular Type Classes; however, if you're familiar with the cake pattern (and hierarchical cakes), you're familiar with similar concepts under different names. Moreover, Scala is more inspired by ML than by Haskell, so studying ML modules explains Scala better than studying type classes, and a lot of what those slides propose can be translated to (an extension of) Scala somewhat easily. I'm sorry I didn't yet find the time to write properly on this connection in general.

Blaisorblade commented 9 years ago

A more concrete question: how often do programs rebuild the same implicit in the same context? Does anybody have an educated guess? Such rebuildings could be avoided without the cost of building a cache!

Scenario: If I have two calls taking the same implicit, right now Scala rebuilds the implicit each time, even if those implicits are built in the same context.

With purity analysis or annotations, such duplicated calls could be removed statically without creating any cache; since such an optimization would be essentially unconditionally helpful (it's just common-subexpression elimination), it could/should be implemented in the compiler (in fact, it's a very useful general optimization). This might be annoyingly hard to implement, but it's been requested multiple times, so maybe there's already progress?

The only question is: how often does this scenario come up? @lihaoyi, do you have an idea at least for your experience?

lihaoyi commented 9 years ago

I don't really know, but it doesn't really matter does it? Whether you're materializing the thing twice in one function call, or you're materializing it once-per-call and calling it twice, it's still a dead waste if the thing you're materializing has no state and really should just be a static method somewhere. I'd say a more general "instantiate once, ever" thing would be far better than trying to be clever with local scopes.

Also, there's something to be said for rebuilding the same implicits in different contexts. e.g. if I use uPickle and read/write the same case class in a bunch of places, that results in a lot of duplicate code that really should live in one place. That doesn't affect performance though and is a different concern from inline caching/interning of each callsite.

Blaisorblade commented 9 years ago

I don't really know, but it doesn't really matter does it? [...]

if the thing you're materializing has no state and really should just be a static method somewhere

Thanks for your quick answer. I understand your frustration, but what you're proposing seems much harder than you make it seem, and than anything that has been proposed.

Indeed, creating static methods (instead of storing the values in a table) hasn't been proposed yet in this discussion (AFAICS). It's indeed interesting, but this is far from trivial.

It's a whole-program transformation!

Maybe I'm missing something (and would like to be enlightened), but the more I look at it, the more complications show up. To be sure, there's one well-known way to achieve what you want, but you don't want to use it always — it's called monomorphization and is related to Scala specialization, but creates tons of code and unlike Scala specialization it must be done for the whole program, which is a non-starter. In principle, you might achieve the whole-program processing through load-time specialization.

I've just found a reference from the excellent Oleg Kiselyov on implementation techniques for typeclasses, discussing how to implement correctly the basic techniques (Scala's dictionary passing, monomorphization and intensional type analysis).

A non-global version

More plausibly, you can do common-subexpression elimination across a compilation unit, moving pure implicits without dependencies on the local context (that is, constructed by closed terms) to static lazy fields. This avoids the harder problems:

dealing with type variables by specialization (Scala specialization is different enough that it doesn't help; it only tries to specialize data structures to primitives, so it generates a single variant for all reference types).
unifying those static fields across the program statically (though you can still do it dynamically with some cache).

Memory consumption

However, even if you manage, you have the issue that either all the implicit fields, or all the generated code, takes up lots of space, so much that loading these fields might take even more than recomputing. C++ templates cause code bloat exactly because they're always compiled by monomorphization. Successful specializing compilers use finely-tuned heuristics to figure out when to specialize and when not, or leave the choice to the programmer.

In fact, that's even a problem sometimes with common-subexpression elimination, only less so.

Sharing duplicated code

Also, there's something to be said for rebuilding the same implicits in different contexts. e.g. if I use uPickle and read/write the same case class in a bunch of places, that results in a lot of duplicate code that really should live in one place.

I've also been annoyed by that, but merging that code across the program is a whole-program operation like above. Again, you can merge it per compilation unit. In C++, duplicated definitions generated by templates are at the top-level and are indeed removed by the linker — but we don't have a linker available before the program is run, so it's hard to avoid the code duplication in the class files :-(.

lihaoyi commented 9 years ago

@Blaisorblade I'm not proposing it as a solution, I'm just saying how things should be =P If we agree that this sort of end-goal is desirable, then getting there is a separate concern.

In practice, a lot of individual things would make things better and incrementally move things towards the ideal:

Caching/interning per-call-site: whether as an implicit-thing or macro-thing (doesn't matter) it would remove the unnecessary allocation/initialization overhead per call site and give me my 5%/10%/30% performance back in uPickle =P
Same-scope resolved-implicit-sharing: to answer the original question, this doesn't appear at all in code I write, but maybe the shapeless people have different usage patterns with all their implicit magic. It wouldn't affect me (or I imagine most people) at all.
Turning these things static & de-duping code: this is as mentioned a whole-program-optimization. That doesn't mean it's impossible; both Proguard and Scala.js are very successful at doing whole-program operations, if you're willing to give up separate compilation.

The first I think could be actually pretty trivial; just stick it in a global Any lazy-val somewhere and cast it as necessary, letting it be opt-in (via annotation?) and "buyer beware" so it'll simply do-the-wrong-thing/blow-up if the thing being annotated isn't properly pure/constant.

Blaisorblade commented 9 years ago

@Blaisorblade I'm not proposing it as a solution, I'm just saying how things should be =P If we agree that this sort of end-goal is desirable, then getting there is a separate concern.

Makes sense. I think we all agree on that end-goal, the question is just about how to approach it. I think one of the most approachable solution is macmemo (after fixing kciesielski/macmemo#8 — but that's easiest than anything else here) — even if you want to do something else, this could be a prototype.

Same-scope resolved-implicit-sharing: to answer the original question, this doesn't appear at all in code I write, but maybe the shapeless people have different usage patterns with all their implicit magic. It wouldn't affect me (or I imagine most people) at all.

That seems strange: The benchmark your email mentions (a loop using uPickle) would be affected, and I think shapeless's Lazy would help there (though it affects the library implementation significantly). Maybe that's just a synthetic microbenchmark.

I can't really find much documentation, but this test seems to be significant (though, of all those tests in that file, it seems exactly 0 show that Lazy does what we'd want it to do): https://github.com/milessabin/shapeless/blob/master/core/src/test/scala/shapeless/lazy.scala#L127-L137

(What I'm missing is a test declaring an implicit def effectfulInt: Int = ...computation..., using an implicit Lazy[Int] and showing the effects of the computation only happen once. @milessabin, is my expectation correct? Did I miss the test for it? testEffectOrder is not it, since the implicit is a Lazy[Int]. Ah! That test is not there, but implicit lookup in testRecursive would diverge if Lazy didn't work!).

The first I think could be actually pretty trivial; just stick it in a global Any lazy-val somewhere and cast it as necessary, letting it be opt-in (via annotation?) and "buyer beware" so it'll simply do-the-wrong-thing/blow-up if the thing being annotated isn't properly pure/constant.

Sorry for being dense, what do you store there? Since you use Any, I assume you store different implicits, but when can you ever reuse the value there? Maybe you mean a lazy-val per call-site, but the same call-site might still need different implicits, so the same question arises.

Both Proguard and Scala.js are very successful at doing whole-program operations, if you're willing to give up separate compilation.

I assumed the whole-program compilation times would make this impractical, but I stand corrected.

Blaisorblade commented 9 years ago

I think shapeless's Lazy would help there

Whoops, I just read you're using something like that already!

lihaoyi commented 9 years ago

That seems strange: The benchmark your email mentions (a loop using uPickle) would be affected, and I think shapeless's Lazy would help there (though it affects the library implementation significantly). Maybe that's just a synthetic microbenchmark.

Yeah it's synthetic. In reality, nobody's going to do this in such a tight loop. The tight loops that do exist (e.g. deserializing a large List[T]) generally exist in a place where the Reader[T] is already only instantiated once outside (implicitly) and passed in, so even though in practice I'd have lots of tight serialization/deserialization loops, none of them would instantiate as much repeated implicit cruft in the same scope.

Maybe you mean a lazy-val per call-site, but the same call-site might still need different implicits, so the same question arises.

That's what i meant. Yes, it's not fully thought through. I'm still thinking of this as something a macro would handle, and the macro-author would have the authority to decide whether or not something needs to be stored somewhere or not.

I assumed the whole-program compilation times would make this impractical, but I stand corrected.

Yeah, it isn't obvious that it was =P both proguard and google closure are pretty slow doing their WPO, and Scala.js's fast incremental optimizations are very surprising. It turns out that by not repeatedly throwing away and re-infering type information, things are a lot easier...

paulp commented 9 years ago

It turns out that by not repeatedly throwing away and re-infering type information, things are a lot easier...

Although this is largely dictated by the hopelessly mutable design of the compiler, even without that issue it would be doing far too much work. It should infer principal typings, not types.

lihaoyi commented 9 years ago

Whats a Principle typing? On Dec 26, 2014 4:13 AM, "Paul Phillips" notifications@github.com wrote:

It turns out that by not repeatedly throwing away and re-infering type information, things are a lot easier...

Although this is largely dictated by the hopelessly mutable design of the compiler, even without that issue it would be doing far too much work. It should infer principal typings, not types.

— Reply to this email directly or view it on GitHub https://github.com/typelevel/scala/issues/83#issuecomment-68120374.

Blaisorblade commented 9 years ago

We're even more OT now, but to answer:

It should infer principal typings, not types.

What's a principal typing?

Computing a principal type takes an expression e and the environment G (that is, types of all other definitions) as input and produces the most general type T for e in G. "Most general type" means that every other type is a "specialization" of T.

Instead, computing a principal typing produces both T and G from e. That is, it computes the types it requires from other classes, so you can typecheck each class separately. Principal typings don't exist for all languages.

But given the features of the Scala typesystem, can you have principal typing with a "useful" type system? There's work on extending Hindley-Milner to Java, but I hear the outputs are too complicated to be useful (because they have to list too many valid alternatives).

milessabin commented 8 years ago

To resurrect this issue, please rework it as an issue/PR against Lightbend Scala (ie. scala/scala).

typelevel / scala