ostafen / clover

A lightweight document-oriented NoSQL database written in pure Golang.
MIT License
680 stars 55 forks source link

Empty database uses more than 2 GB of disk space #35

Closed thiko closed 2 years ago

thiko commented 2 years ago

Hi all

First of all: thanks for your work - it's tackling my need and works pretty well :) One quick question: Is it intended for a nearly empty database to use > 2 GB disk space?

Example (see 000006.vlog):

▶ ls -alth test-clover-db        
total 52K
drwxrwxr-x 2 thiko thiko 4,0K Apr 24 07:41 .
-rw------- 1 thiko thiko   58 Apr 24 07:41 MANIFEST
-rw-rw-r-- 1 thiko thiko  399 Apr 24 07:41 000003.sst
-rw-rw-r-- 1 thiko thiko   20 Apr 24 07:41 000005.vlog
-rw-rw-r-- 1 thiko thiko 2,0G Apr 24 07:41 000006.vlog
-rw-rw-r-- 1 thiko thiko 128M Apr 24 07:41 00006.mem
-rw-rw-r-- 1 thiko thiko    5 Apr 24 07:41 LOCK
-rw------- 1 thiko thiko   60 Apr 23 20:54 .directory
-rw-rw-r-- 1 thiko thiko  399 Apr 23 20:53 000002.sst
-rw-rw-r-- 1 thiko thiko  349 Apr 23 20:50 000001.sst
-rw-rw-r-- 1 thiko thiko 1,0M Apr 23 20:03 DISCARD
-rw------- 1 thiko thiko   28 Apr 23 20:03 KEYREGISTRY

Thanks in advance!

ostafen commented 2 years ago

Hi, @thiko. If you ended with an empty database after inserting several items, and then you deleted all of them, yes it can be normal. Internally clover uses badger, which is a log structured key value store. This means that when you delete an object, a "delete" record is appended to the value log. Logs should be compacted in the background, once the database reaches a suitable size.

EDIT: It seems that inside badger, space must be reclaimed by explicitly calling the RunValueLogGC function. Currently this function is never called. I will write a fix for this. Thank you for reporting this issue

thiko commented 2 years ago

Hi @ostafen! Thanks for your response!

On my local setup, badger seems to reserve 2 GB on disk space immediately before even apply any operation to it (beside open the connection).

Anyway - I'm looking forward for your fix using the GC call :+1:

ostafen commented 2 years ago

On my local setup, badger seems to reserve 2 GB on disk space immediately before even apply any operation to it (beside open the connection).

So you get a 2GB file even just after calling Open() on an empty directory? Can you post a snipped of code showing your usage? It will help me to have the whole picture :=)

thiko commented 2 years ago

Sure - it's a small playground project so far. Please find the related code parts below:

// package and other imports
import cloverdb "github.com/ostafen/clover"

type clover struct {
    db *cloverdb.DB
}

func OpenCloverConnection(databaseName string) (*clover, error) {
    con, err := cloverdb.Open(databaseName)
    if err != nil {
        return nil, err
    }

    return &clover{db: con}, nil
}

func (c *clover) CloseConnection() error {
    return c.db.Close()
}
// package and other imports

func main() {

    // Init. connection
    db, err = storage.OpenCloverConnection("test-clover-db2")
    if err != nil {
        panic("Unable to open database connection!")
    }
        defer db.CloseConnection()
}

That's it ;) Using go versio 1.18 mod.go contains github.com/ostafen/clover v1.0.1 Operating system is Ubuntu 18.04.6 LTS

When running go run . it creates a new directory containing:

▶ ls -alth test-clover-db2 
total 32K
drwxrwxr-x  2 thiko thiko 4,0K Apr 24 11:11 .
-rw-rw-r--  1 thiko thiko 2,0G Apr 24 11:11 000001.vlog
-rw-rw-r--  1 thiko thiko 128M Apr 24 11:11 00001.mem
-rw-rw-r--  1 thiko thiko 1,0M Apr 24 11:11 DISCARD
-rw-------  1 thiko thiko   28 Apr 24 11:11 KEYREGISTRY
drwxrwxr-x 12 thiko thiko 4,0K Apr 24 11:11 ..
-rw-rw-r--  1 thiko thiko    6 Apr 24 11:11 LOCK
-rw-------  1 thiko thiko   16 Apr 24 11:11 MANIFEST
ostafen commented 2 years ago

Are you calling db.Close() on the clover db instance inside db.CloseConnection()?

thiko commented 2 years ago

Sorry, missed that part. I'll modify my original post to keep it together

ostafen commented 2 years ago

@thiko by doing some quick tests it turns out that the problem is related to not calling db.Close() properly. Consider this snippet:

db, err := c.Open("./db")
if err != nil {
     panic(err)
}

It produces this database:

-rw-rw-r-- 1 stefano stefano 2147483646 apr 24 11:32 000001.vlog
-rw-rw-r-- 1 stefano stefano  134217728 apr 24 11:32 00001.mem
-rw-rw-r-- 1 stefano stefano    1048576 apr 24 11:32 DISCARD
-rw------- 1 stefano stefano         28 apr 24 11:32 KEYREGISTRY
-rw-rw-r-- 1 stefano stefano          6 apr 24 11:32 LOCK
-rw------- 1 stefano stefano         16 apr 24 11:32 MANIFEST

But, if i correctly call db.Close():

db, err := c.Open("./db")
defer db.Close()
if err != nil {
     panic(err)
}

what I get is the following:

-rw-rw-r-- 1 stefano stefano      20 apr 24 11:29 000001.vlog
-rw-rw-r-- 1 stefano stefano 1048576 apr 24 11:29 DISCARD
-rw------- 1 stefano stefano      28 apr 24 11:29 KEYREGISTRY
-rw------- 1 stefano stefano      16 apr 24 11:29 MANIFEST
ostafen commented 2 years ago

@thiko, I run exactly the same snipped you posted. It should be fine. I'm getting the correct size since db.Close() is called.

-rw-rw-r-- 1 stefano stefano      20 apr 24 11:29 000001.vlog
-rw-rw-r-- 1 stefano stefano 1048576 apr 24 11:29 DISCARD
-rw------- 1 stefano stefano      28 apr 24 11:29 KEYREGISTRY
-rw------- 1 stefano stefano      16 apr 24 11:29 MANIFEST

If i remove the call to db.Close() I get the 2GB size

thiko commented 2 years ago

@ostafen you are right - the issue relates to a missing db.close() call. I wasn't aware about the fact, that defer is not called when killing the app using os signals (CTRL + C in fact). I'll handle the signal for a graceful shutdown.

My bad - sorry. Shall we close this issue?

ostafen commented 2 years ago

@thiko, okay thank you :=) Let's leave the issue open. Your problem of the initial 2GB size was due to db.Close() not being called. However, the problem related to the RunValueLogGC() is a separate problem and needs a fix. I'll work on it and release a fix by tomorrow :=)

thiko commented 2 years ago

Great - thanks for your help!

jsgm commented 2 years ago

Hello @thiko @ostafen,

Thanks for checking this. I was facing this issue and I can also confirm that the problem was related to not using db.Close. Now the database is just around 1 MB

ostafen commented 2 years ago

I made the title of the issue more self-explainatory in order to help people facing the same issue to quickly locate this

ostafen commented 2 years ago

With the last commit, I implemented a simple garbage collection strategy which basically calls RunValueLog() periodically in the background every 5 minutes. In the future, I will improve the GC strategy to run only during inactivity periods, but for now its does its job :=)