When I run this with -Xmx200m and watch GC logs, I can see the memory usage gradually growing, and the program slows down as it approaches the maximum usage; with -Xmx500m it has enough headroom to finish in a more reasonable amount of time.
I'm fairly confident this is due to how this line interacts with the four-element vector produced here. In particular, because the vector is chunked, the map inside the mapcat will retain a reference to the third element (the large collection).
As evidence for this explanation, this change to fipp (unchunking the sequence) seems to fix it:
I do not know what a clean fix for this would be. I'm not sure we can make the above change to fipp without potentially sacrificing performance in the farther-down-the-stack case where it's processing an actual large sequence, rather than a vector containing a large sequence as an element. And I can't think of anything that puget could do to cause fipp to behave differently.
It's from a while ago so I can't remember all of the details, but this fipp PR and the comments therein are related. I found that unchunking that exact same sequence prevented heap exhaustion on JDK8+.
Minimal reproduction:
(puget.printer/pprint (repeatedly 500000 #(rand-int 1000000000)))
When I run this with
-Xmx200m
and watch GC logs, I can see the memory usage gradually growing, and the program slows down as it approaches the maximum usage; with-Xmx500m
it has enough headroom to finish in a more reasonable amount of time.I'm fairly confident this is due to how this line interacts with the four-element vector produced here. In particular, because the vector is chunked, the
map
inside themapcat
will retain a reference to the third element (the large collection).As evidence for this explanation, this change to fipp (unchunking the sequence) seems to fix it:
I do not know what a clean fix for this would be. I'm not sure we can make the above change to fipp without potentially sacrificing performance in the farther-down-the-stack case where it's processing an actual large sequence, rather than a vector containing a large sequence as an element. And I can't think of anything that puget could do to cause fipp to behave differently.