microstream-one / microstream

High-Performance Java-Native-Persistence. Store and load any Java Object Graph or Subgraphs partially, Relieved of Heavy-weight JPA. Microsecond Response Time. Ultra-High Throughput. Minimum of Latencies. Create Ultra-Fast In-Memory Database Applications & Microservices.
https://microstream.one/
Eclipse Public License 2.0
558 stars 44 forks source link

Calling `issueFullFileCheck()` does not seem to remove old unreferenced objects #655

Open andriikovalov opened 1 year ago

andriikovalov commented 1 year ago

Environment Details

Describe the bug

When I reset and store the application root, after calling issueFullFileCheck() the old inaccessible data seems to be still in the storage (judging from the size).

To Reproduce

public class App {
    public static void main(String[] args) throws IOException {
        Path storagePath = Files.createTempDirectory("microstream");
        final EmbeddedStorageManager storageManager = EmbeddedStorage.start(storagePath);

        for (int i = 0; i < 50; i++) {
            storageManager.setRoot(new byte[1000000]); // 1 MB
            storageManager.storeRoot();
        }

        storageManager.issueFullFileCheck();
        storageManager.shutdown();

        System.out.println("Storage size " + getSize(storagePath));  // Expected ~1 MB, actual ~50 MB
    }

    public static long getSize(Path dir) throws IOException {
        return Files.walk(dir).map(Path::toFile).filter(File::isFile).mapToLong(File::length).sum();
    }
}

Expected behavior

The storage is shrinked to only contain the current root.

Additional context

I observe the same behaviour when I wrap my byte array into a root object, and repeatedly re-initialize the array and call storeRoot().

class Root {
    public byte[] data;
}
hg-ms commented 1 year ago

Hello, This not a bug. In your example the storage has no time to clean up old data. The point of time when data gets deleted depends on several factors, the most important ones are:

The Example below should perform better regarding the cleanup. It ensures that the Java GC and storage GC are executed and sets a very small object live time for the storage cache and increases the time budget for housekeeping.

final EmbeddedStorageManager storageManager = EmbeddedStorage
    .start(Storage.ConfigurationBuilder()
        .setEntityCacheEvaluator(Storage.EntityCacheEvaluator(1000, 10))
        .setHousekeepingController(Storage.HousekeepingController(100, 1000_000_000))
        .setStorageFileProvider(Storage.FileProvider(storagePath)).createConfiguration());

for (int i = 0; i < 50; i++) {
    storageManager.setRoot(new byte[1000000]); // 1 MB
    storageManager.storeRoot();
}

System.gc();
storageManager.issueFullGarbageCollection();
storageManager.issueFullFileCheck();
andriikovalov commented 1 year ago

Okay, thank you. I thought that explicitly calling housekeeping would delete unreachable data (as mentioned in #179), but in fact it is not guaranteed, and cannot be enforced, right? Your snippet with "aggressive housekeeping" also gives me the same result (storage not shrinked).