trinodb / tpch

Port of TPC-H dbgen to Java
44 stars 45 forks source link

Retain big TextPool memory softly #30

Closed findepi closed 1 year ago

findepi commented 1 year ago

Before the change, the TextPool.DEFAULT_TEXT_POOL would be a strongly referenced, 300 MB memory block. After the change, it's loaded when used, and shared opportunistically, by keeping it on a soft reference. Additionally, a strong reference is kept for 10 seconds to avoid trashing. On author's laptop, new TextPool gets created in approximately 0.85 s.

This is expected to reduce memory usage for applications that generate TPC-H data from time to time.

findepi commented 1 year ago

I am not 100% sure yet this is a good change. Reduced memory use is good. On the negative side, we may start spending lot time in re-initing the TextPool, eg for Trino TestLocalQueries, which read from TPC-H tables thousands of times.

Maybe we should also have a timed strong reference, i.e. keep TextPool strongly for at least eg 5 seconds.

cc @losipiuk

electrum commented 1 year ago

Seems like a perfect use case for a soft reference.

findepi commented 1 year ago

Seems like a perfect use case for a soft reference.

Correct, it should be soft, not weak. My bad.

My concerns still remain though, at least for me.

findepi commented 1 year ago

Switched to soft reference & update PR title and description.

losipiuk commented 1 year ago

LGTM - but it requires testing how does it behave in practice. How often are we reloading.

findepi commented 1 year ago

LGTM - but it requires testing how does it behave in practice. How often are we reloading.

i am thinking that even if we test current impact of the change on Trino, things may change. For example, if we start using a little bit more memory, the frequency of TextPool loads may (silently) increase.

instead of testing the change I will

@losipiuk wdyt?

losipiuk commented 1 year ago

LGTM - but it requires testing how does it behave in practice. How often are we reloading.

i am thinking that even if we test current impact of the change on Trino, things may change. For example, if we start using a little bit more memory, the frequency of TextPool loads may (silently) increase.

instead of testing the change I will

  • add logging
  • add timed strong reference, to prevent re-loading this "every moment"

@losipiuk wdyt?

Sure - fine by me. If it was not an external library I would suggest to fail test in Trino if we see reloads to often, so we notice that we should change the thresholds. Not really feasible if this is exernal lib.

How do you envision strong reference to work?

findepi commented 1 year ago
  • disallow reload more often than every X seconds (keep strong reference for X seconds after reload)

Just did that, with X = 10.

please review

findepi commented 1 year ago

(just rebased)