taoensso / nippy

The fastest serialization library for Clojure
https://www.taoensso.com/nippy
Eclipse Public License 1.0
1.04k stars 60 forks source link

Nippy has long require times when a project has clojure.core.async included #133

Closed cnuernber closed 3 years ago

cnuernber commented 3 years ago

If a project has clojure.core.async transitively included but is not using it require times for nippy go up considerably, potentially by as much as 3 times.

On my laptop, with no core.async in a project, nippy takes about 3 seconds to require. With core.async transitively included via other libraries and not used in any way, nippy takes 9 seconds to require.

As a proof of concept, I took the liberty of hacking together a version of nippy that does not have this trait.

ptaoussanis commented 3 years ago

@cnuernber Hi Chris, thanks for the issue!

Just to confirm that I understand- what we're talking about here is a startup slowdown only in the event that:

Is that right?

If so - do we have any idea how common this is, and/or what the motivation might be? Especially if folks are in an environment where they care about startup time - perhaps they could be advised to not include dependencies they're not using?

On board if we can find a small change in encore to help with this, but forking encore as in #133 would be a major and ongoing cost to pay.

ptaoussanis commented 3 years ago

@cnuernber Okay, I've made an experimental change in encore to potentially help with this.

On Clojars at [com.taoensso.debug/encore "3.4.0-SNAPSHOT"].

I'm a little hesitant given the complexity of the change- would still like to hear your thoughts re: how common this particular edge case might be.

cnuernber commented 3 years ago

The proposed encore fix worked for me.

For us it was quite common. Similar to you, we have a few libraries at more or less root level in our hierarchy one of which is tech.parallel which includes a dependency to clojure.core.async but does not load it unless you require a particular namespace.

Then comes the numeric system where I used perhaps too much compile time engineening, then the dataset library which now has 20+ second require times even to do basic data exploration (load a csv, play around, save result).

I agree that sometimes it is possible to avoid the core.async dependency but often it is quite tough as it gets pulled in via many transitive dependencies (like Neanderthal, tech.parallel, etc). I really think with java8 and blocking queues and such core.async isn't really necessary to do the most of concurrent engineering that people do but that ship has sailed.

For your average Clojure person I think they eat the 20 second startup time and don't worry about it. One other thing I have found so far is that while aot reduces startup times they tend to be related to your initial startup time without AOT so in environments where startup time does matter that is a thing to keep in mind. Avoiding requiring systems you aren't using seems prudent.

In any case, I am actually going both directions. Removing core.async from systems carefully and completely where it isn't crucial and finding areas where having it is causing more issues than necessary.

One thought we had was why not split encore up into several sections? Then require what you are using; it seems like it naturally splits up into many fairly unrelated files and clearly nippy for instance only uses a small portion of the overall encore entourage.

So lots of things to think about here :-). Thanks for your time, I agree that it isn't everyday that people care about this stuff and it is getting fairly niche of niche.

ptaoussanis commented 3 years ago

Thanks Chris-

which includes a dependency to clojure.core.async but does not load it unless you require a particular namespace.

That sounds reasonable to me.

sometimes it is possible to avoid the core.async dependency but often it is quite tough as it gets pulled in via many transitive dependencies (like Neanderthal, tech.parallel, etc)

I would note that it's maybe a library issue if it's including a core.async dependency but doesn't actually require it?

One thought we had was why not split encore up into several sections?

It definitely could be broken up, but the trade-off there would be making it less convenient to use - which is a key objective. I have tons of code out there that regularly uses a wide variety of things in encore, and it's a major convenience not needing to import and manage multiple namespaces.

The amount of extra time involved in the compilation is not generally significant IME unless you're doing some sort of command-line loading.

Cheers! :-)

cnuernber commented 3 years ago

which includes a dependency to clojure.core.async but does not load it unless you require a particular namespace.

That sounds reasonable to me.

sometimes it is possible to avoid the core.async dependency but often it is quite tough as it gets pulled in via many transitive dependencies (like Neanderthal, tech.parallel, etc)

I would note that it's maybe a library issue if it's including a core.async dependency but doesn't actually require it?

You seem to be arguing both sides of the issue here. First, it is reasonable to have a dependency you don't always use unless you require the namespace that does actually use said dependency

Second, it is a library issue to have a dependency on something that you aren't using in every use case of the library...

The first seems reasonable to me, the second much less so.

Again, thanks for the quick responses.

ptaoussanis commented 3 years ago

You seem to be arguing both sides of the issue here. First, it is reasonable to have a dependency you don't always use unless you require the namespace that does actually use said dependency

Ah, thank you! I'd actually misread your first statement:

Similar to you, we have a few libraries at more or less root level in our hierarchy one of which is tech.parallel which includes a dependency to clojure.core.async but does not load it unless you require a particular namespace.

I'd somehow missed that you're talking about libraries here, thought you were talking about application code.

In this case, I don't think a library should be including a dependency it doesn't need.

If a library L has features that can use dependency D, but its core functionality doesn't otherwise require that dependency - then I'd advocate for sticking the extra functionality in a dedicated namespace, and asking the downstream library consumer to add the dependency if they want it.

cnuernber commented 3 years ago

That is probably the best option overall as it avoids some version conflicts and I have gone to that design for the next version of the numeric system. Core.async, unfortunately, is one of those that everyone includes all over the place whether it is necessary or otherwise.

Then you get these issues instead ;-).