mbdavid / LiteDB

LiteDB - A .NET NoSQL Document Store in a single data file
http://www.litedb.org
MIT License
8.62k stars 1.25k forks source link

Database queries pull entire DB size into RAM and leave it there after done #2278

Closed mathis1337 closed 1 year ago

mathis1337 commented 1 year ago

Version 5.0.15

Describe the bug No matter what type of query I call against a database it seems to load the entire size of the DB into ram, and when done the size of the ram does not go back down even if I force call GC.Collect().

so two issues:

  1. The queries are calling the entire size of DB into ram it seems
  2. When queries are over and method has finished the RAM size stays where its at for life of the running program.

Code to Reproduce Here are some example queries being called against a DB that is 1gb in size. Each one of these will perform faster or slower and that is interesting as some will load the DB immediately into memory, while say the last one will add it slowly each query at a time due to the Limit.

var blocks = BlockchainData.GetBlocks().Query().Where(Query.Between("Height", currenRunHeight == 0 ? 0 : currenRunHeight+1, heightSpan)).ToList();
var blocks = BlockchainData.GetBlocks().Query().Skip((int)currenRunHeight).Limit((int)heightSpan - (int)currenRunHeight).ToList();
var blocks = BlockchainData.GetBlocks().Query().Where(x => x.Height >= currenRunHeight && x.Height < heightSpan).ToList();
var blocks = BlockchainData.GetBlocks().Query().Where(x => x.Height >= currenRunHeight && x.Height < heightSpan).Limit((int)heightSpan - (int)currenRunHeight).ToList();

Expected behavior I am expecting that even with a limit only X amount to be called at a time. I am calling roughly 1.6% of the records from this DB at a time and not storing them in any static list. I would expect ram usage to stay low, but its getting up to 1.5gb with the above queries. If I try a Find or anything else it can grow to 6gb of ram usage, so something seems to be off.

Screenshots/Stacktrace image image

Additional context I am not sure if there is a cache being stored in ram, if so it would be nice to have ability to clear it, and more so it would be better if the queries did not pull the full size of the DB into memory.

Thank you.

mathis1337 commented 1 year ago

So going to update this. After switching to .ToEnumerable the memory leak went away. Not sure why .ToList and .ToArray are treating the non static var as if it were static. Any clarification would be apprecaited if this is by design. If so for anyone else out there avoid using them if you are doing a long running query.

This is query that worked for me.

var blocks = blockChain.Query()
                                .Where(x => x.Height >= currenRunHeight && x.Height < heightSpan)
                                .Limit((int)heightSpan - (int)currenRunHeight)
                                .ToEnumerable()
mathis1337 commented 1 year ago

I have another thing to add also. If I use this query : var blocks = BlockchainData.GetBlocks().Query().Where(Query.Between("Height", currenRunHeight == 0 ? 0 : currenRunHeight+1, heightSpan)).ToList();

With normal foreach then each time I query the blocks variable will increase. However, if I do this:

blocks.Clear();
blocks = new List<Block>();

At the end of each loop cycle then memory will not grow. I am querying over millions of records, but only processing around 10k at a time.

At this point I am wondering does LiteDb do something with .ToList that normal C# does not? Or even bigger question is this a C# issue?

Goblinth commented 1 year ago

I have another thing to add also. If I use this query : var blocks = BlockchainData.GetBlocks().Query().Where(Query.Between("Height", currenRunHeight == 0 ? 0 : currenRunHeight+1, heightSpan)).ToList();

With normal foreach then each time I query the blocks variable will increase. However, if I do this:

blocks.Clear();
blocks = new List<Block>();

At the end of each loop cycle then memory will not grow. I am querying over millions of records, but only processing around 10k at a time.

At this point I am wondering does LiteDb do something with .ToList that normal C# does not? Or even bigger question is this a C# issue?

This is just how csharp works. ToList() creates a new list and assigns it to the 'blocks' variable. The previous list still exists in memory until it is eventually garbage-collected. The code you use manually removes each entry from the old list and then makes it so the old list is no longer being referenced. It is possible that LiteDB holds onto references too long somewhere internally though--preventing them from being freed (even with GC.Collect()).