Closed filipesilva closed 3 years ago
I tried with https://github.com/replikativ/datahike-postgres and saw the same increase in memory usage on transaction.
I used this command to start postgres with docker:
docker run -p 5432:5432 --name some-postgres -e POSTGRES_PASSWORD=mysecretpassword -d postgres
And this datahike config:
{:backend :pg
:host "localhost"
:port 5432
:username "postgres"
:password "mysecretpassword"
:path "/postgres"}
This is not the default config in datahike-postgres
' readme because of https://github.com/replikativ/datahike-postgres/issues/11, so I used the workaround in https://github.com/replikativ/datahike-postgres/pull/12.
Left the repro for 5000 iterations using datahike-postgres and saw transact time ebb up and down between 18 and 85 ms, which points towards it not increasing indefinitely.
Memory on the other hand mostly climbed up from the initial 413mb starting point, sometimes going down 20mb or so, but after the 5000 iterations it was at 615mb.
Docker lists the postgres container using the following resources at the end of the 5000 iterations:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
523ab6d2e865 some-postgres 0.00% 38.8MiB / 1.942GiB 1.95% 616MB / 1.2GB 0B / 316MB 7
Left the same process running against postgres again, and after 6000 txs it climbed from 613mb to 635mb. So if it's climbing further than that, it is slower than the climb up to 600mb.
Tried the local file system backend again on the same process where I had been trying the others.
After 1500 txs it went from 634mb to 570mb. Transactions started taking 44ms and went as high as 164ms, but started ebbing up and down around those values.
So it looks like there's no memory leak with either file or postgres backends, but that memory usage stabilizes around 600mb.
@filipesilva Thanks for taking the time to report! This is probably due to our cache of the actual tree fragments. At the moment we do not cap the size of Datoms (e.g. the string length of values), so there is no easy way to bound the memory in size of megabytes, but one can definitely decrease the cache size of konserve. The current value of is fairly aggressive with a thousand tree fragments, each containing approximately 300
Datoms: https://github.com/replikativ/datahike/blob/development/src/datahike/connector.cljc#L129. konserve
also is being significantly upgraded at the moment including a bit more intelligent caching, but the total size is still configured from the outside. What expectations do you have and how can we get closer to what you need?
@whilo sorry for the delay in answering! Thank you for the context. We guessed it might be the cache because we saw a lot of memory being kept in some file-related things in superv.async on profiling.
I think our expectations would mostly be a way to distinguish between a real memory leak and just some reasonable minimum requirements. If we had known about the cache size in advance, and maybe had a way to manipulate it, it would have been very obvious that there was no leak. Some reasonable minimum system reqs would also be nice.
Thanks, we will expose the setting and be more explicit about the system requirements. To provide absolute upper bounds on the requirements we would need to bound Datom size. What would you expect as maximum string size? (we can make this configurable as well)
We thought about string size as well. We checked if we were saving something growing ever bigger. I wouldn't be surprised if I we needed ever larger amount of memory if I'm saving bigger and bigger strings. So we'd be happy enough just limiting string on our app.
At https://github.com/athensresearch/athens we started noticing a memory leak when using Datahike
0.3.7-SNAPSHOT
and0.3.2
.Below is a reproduction that only uses a single unindexed schema attribute:
Eval'ing
(repro)
inside a REPL will yield output similar to this for the first time:Used memory is in MB. You can see here that it increases over time. A GC hint is provided every iteration, and every 50 iterations there's a 5s sleep call.
The
repro
function will release the conn and delete the database. If you eval the function multiple times you can also see that the used memory is not GC'd between evals.Also worth mentioning that the transaction time increases within the same eval, but resets between evals.