oven-sh / bun

Incredibly fast JavaScript runtime, bundler, test runner, and package manager – all in one
https://bun.sh
Other
74.43k stars 2.78k forks source link

Memory is not getting freed after reading multiple files (Node:fs and Bun.File) #15020

Open DoKoB0512 opened 3 weeks ago

DoKoB0512 commented 3 weeks ago

What version of Bun is running?

1.1.34+5e5e7c60f

What platform is your computer?

Linux 6.8.0-45-generic x86_64 x86_64

What steps can reproduce the bug?

There is a memory leak issue when using node:fs readFile method and Bun.file().text() in a project that runs with Bun. The memory is not being released properly, leading to increased memory usage over time.

I have a folder with 10-15 dummy text files, all are in English language ranging from 1MB to 50MB, I loop over the paths and read the file using Node:fs and Bun.file().

I am using NestJS here as an HTTP endpoints for testing various scenarios of file reading. I am using 10.4.7 for NestJS and 23.1.0 for Node.

The machine has equipped with Intel Core i3-10105F - 4 cores, 8 threads with 16GB of RAM.

Here are the imports :

import { file, gc } from "bun";
import { readFile } from "fs/promises";

Here's the code for Node:fs :

@Get('node')
async nodeReadFiles() {
    for (let i = 0; i < this.filePaths.length; i++) {
        const nodeData = await readFile(this.filePaths[i], "utf-8");
    }
}

Here's the code for Bun.File() :

@Get('bun')
async bunReadFiles() {
    for (let i = 0; i < this.filePaths.length; i++) {
        const bunData = await file(this.filePaths[i], { type: "utf-8" }).text();
    }
}

I've also created a function to call GC manually for Bun :

@Get('gc')
useGC() {
    gc(true);
}

I am using nest build to build the project and then running it : bun run dist/main.js for Bun and node dist/main.js for Node

I used oha to quickly send multiple requests on each endpoint to simulate the heavy file-reading usage and monitor the memory consumption. oha -c 10 -z 30s http://localhost:port

I thought their might be some issue with NestJS, so I also created the same project with bun init to see if there's any difference, here's the code:

serve({
    port: 3000,
    async fetch(req) {
        const url = new URL(req.url);
        if (url.pathname === "/gc") {
            gc(true)
        } else {
            for (let i = 0; i < filePaths.length; i++) {
                const data = await file(filePaths[i], { type: 'utf-8' }).text();
            }
        }
        return new Response(`true`);
    },
});

I built with the above project using bun build ./index.ts --compile --target=bun --minify --outfile=bun-fs

What is the expected behavior?

Memory should be freed when using Bun.file().text() in a manner similar to how it is managed with node:fs.readFile(). Both methods should release memory properly after file reading operations, preventing memory leaks and excessive memory usage over time.

What do you see instead?

Bun stacks up the memory usage overtime and does not release it after sometime. Even after using the GC manually for bun, the memory consumption is very high compared to Node.

Running the project with node dist/main.js. Peak Memory usage - 2.01 GB. Average hovers between 300-600MB:

Image

Same Node process after 10 seconds, maybe GC is triggered :

Image

Running the same project with bun dist/main.js - Before the file-reading test :

Image

Running bun dist/main.js for Node:fs readfile Peak Memory usage - 1.23 GB. Average hovers between 700-950MB:

Image

Same bun process after some idle time (Nothing happens, same memory usage) :

Image

Same bun process after triggering the Bun's GC manually :

Image

Running bun dist/main.js for Bun.file(). Peak Memory usage - 1.11 GB. Average hovers between 850 MB - 1 GB :

Image

Same bun process after some idle time (Nothing happens, same memory usage) :

Image

Same bun process after triggering the Bun's GC manually (Shocking, I called the same GC endpoint multiple times just to make sure its working) :

Image

Additional information

I have also tried to change the code and started triggering the GC manually after each file read (I know, kind of a dumb thing to do) but the results are good for bun.

Here's the updated code for Node:fs :

@Get('node')
async nodeReadFiles() {
    for (let i = 0; i < this.filePaths.length; i++) {
        const nodeData = await readFile(this.filePaths[i], "utf-8");
        gc(true);
    }
}

Here's the updated code for Bun.File() :

@Get('bun')
async bunReadFiles() {
    for (let i = 0; i < this.filePaths.length; i++) {
        const bunData = await file(this.filePaths[i], { type: "utf-8" }).text();
        gc(true);
    }
}

Running bun dist/main.js for Node:fs readfile Peak Memory usage - 725 MB. Average hovers between 550-700MB:

Image

Running bun dist/main.js for Bun.file(). Peak Memory usage - 700 MB. Average hovers between 150 MB - 600 MB (I tried running this multiple times but got the same result, after the last file-read and gc call, the usage was still quite high) :

Image

Due to the high-usage from above, I triggered the Bun's GC manually (I only called the GC endpoint once and it dropped down to 170MB same as Node process) :

Image

I think Node GC can clean-up better because even with fs/promises, Node was using single thread to read the files :

Image

While bun uses all threads for both Node:fs and Bun.file :

Image

DoKoB0512 commented 3 weeks ago

Here's the memory usage for Native Bun implementation using bun init with the code and build flags that I've mentioned above.

Running the bun project with ./bun-fs - Before the file-reading test :

Image

Same project for Bun.file(). Peak Memory usage - 937 MB. Average hovers between 750 - 900 MB :

Image

Same bun process after some idle time (Nothing happens, same memory usage) :

Image

Same bun process after triggering the Bun's GC manually :

Image

Running the same project with gc(true) after each file-read, just like the above test. Peak Memory usage - 724 MB. Average hovers between 550 - 625 MB and dropped to ~160 MB for once :

Image

triggered the Bun's GC manually because I thought the mem-usage was quite high even after the test and it worked :

Image

Jarred-Sumner commented 3 weeks ago

Can you paste the complete code instead of pieces of it? The exact code is necessary to reproduce this

DoKoB0512 commented 3 weeks ago

I used the default NestJS project and removed the default code in the app.controller.ts

import { Controller, Get } from "@nestjs/common";
import { AppService } from "./app.service";
import { file, gc } from "bun";
import { readFile } from "fs/promises";

@Controller()
export class AppController {
    constructor(private readonly appService: AppService) { }

    private readonly filePaths = [] as const

    @Get('node')
    async nodeReadFiles() {
        for (let i = 0; i < this.filePaths.length; i++) {
            const nodeData = await readFile(this.filePaths[i], "utf-8");
            // gc(true);
        }
    }

    @Get('bun')
    async bunReadFiles() {
        for (let i = 0; i < this.filePaths.length; i++) {
            const bunData = await file(this.filePaths[i], { type: "utf-8" }).text();
            // gc(true);
        }
    }

    @Get("gc")
    freeMem() {
        gc(true);
    }
}

You can use this repo here, remove all the non-essential packages and files and update the app.controller.ts file

Jarred-Sumner commented 2 weeks ago

Can you make a repro which we can run the code directly and reproduce the leak? It’s probably straightforward to repro this based on your description for someone familiar with NestJS, but I am not very familiar with NestJS

DoKoB0512 commented 2 weeks ago

Can you make a repro which we can run the code directly and reproduce the leak? It’s probably straightforward to repro this based on your description for someone familiar with NestJS, but I am not very familiar with NestJS

Here's the repo to reproduce the issue : Nest-Bun-FS Note - I didn't add many files to the repo but you can duplicate those 500KB-5MB files many times to reproduce the same results.

Jarred-Sumner commented 2 weeks ago

Thanks for the reproduction, can confirm that the memory usage is higher than Node's.

The memory usage is not climbing indefinitely, which makes it hard to consider a memory leak. The object type counts (require("bun:jsc").heapStats()) support this conclusion - they don't change much between requests.

That being said, the memory usage is too high.

One thing that's notable here is the difference in memory usage when fs.promises.readFile returns a Buffer object instead of a string:

+      const nodeData = await readFile(this.filePaths[i]);
-      const nodeData = await readFile(this.filePaths[i], "utf-8");

With that diff on macOS arm64:

I think it's likely something to do with how we're creating strings on another thread using mimalloc and then probably taking awhile to free these strings. It will take some digging for us to address this.

Heap stats:

{
  objectTypeCounts: {
    string: 14880,
    Function: 8908,
    Structure: 7742,
    UnlinkedFunctionExecutable: 6786,
    FunctionExecutable: 6343,
    Object: 5101,
    JSLexicalEnvironment: 2551,
    Array: 2440,
    "Immutable Butterfly": 2283,
    SymbolTable: 1843,
    UnlinkedFunctionCodeBlock: 1745,
    StructureRareData: 1591,
    GetterSetter: 999,
    NativeExecutable: 906,
    StructureChain: 391,
    FunctionCodeBlock: 380,
    PropertyTable: 345,
    FunctionRareData: 343,
    RegExp: 245,
    symbol: 174,
    JSPropertyNameEnumerator: 169,
    AsyncFunction: 155,
    Map: 133,
    CustomGetterSetter: 117,
    DOMAttributeGetterSetter: 79,
    Uint8Array: 47,
    SparseArrayValueMap: 41,
    Set: 28,
    Promise: 26,
    WeakMap: 26,
    Module: 20,
    CallSite: 16,
    InternalPromise: 10,
    Arguments: 6,
    JSModuleEnvironment: 5,
    JSSourceCode: 5,
    ModuleRecord: 5,
    GeneratorFunction: 4,
    ModuleNamespaceObject: 4,
    ArrayBuffer: 3,
    FileSink: 3,
    Performance: 3,
    AsyncGeneratorFunction: 2,
    BigInt: 2,
    BufferList: 2,
    Callee: 2,
    Crypto: 2,
    DebugHTTPServer: 2,
    EventTarget: 2,
    Headers: 2,
    InternalFieldTuple: 2,
    Iterator: 2,
    JSAsyncGeneratorFunction: 2,
    NodeJSFS: 2,
    PerformanceEntry: 2,
    PerformanceMark: 2,
    PerformanceMeasure: 2,
    PerformanceObserverEntryList: 2,
    PerformanceObserver: 2,
    StringDecoder: 2,
    SubtleCrypto: 2,
    TemplateObjectDescriptor: 2,
    TextEncoder: 2,
    Timeout: 2,
    URLSearchParams: 2,
    URL: 2,
    "Array Iterator": 1,
    AsyncFromSyncIterator: 1,
    AsyncGenerator: 1,
    AsyncIterator: 1,
    Atomics: 1,
    BigInt64ArrayPrototype: 1,
    BigUint64ArrayPrototype: 1,
    Blob: 1,
    Boolean: 1,
    Buffer: 1,
    Bun: 1,
    CryptoHasher: 1,
    DataView: 1,
    Dirent: 1,
    EventEmitter: 1,
    File: 1,
    FinalizationRegistry: 1,
    Float16ArrayPrototype: 1,
    Float32ArrayPrototype: 1,
    Float64ArrayPrototype: 1,
    Generator: 1,
    GlobalObject: 1,
    Int16ArrayPrototype: 1,
    Int32ArrayPrototype: 1,
    Int8ArrayPrototype: 1,
    InternalModuleRegistry: 1,
    InternalPromisePrototype: 1,
    "Intl.DateTimeFormat": 1,
    Intl: 1,
    "Iterator Helper": 1,
    JSGlobalLexicalEnvironment: 1,
    JSGlobalProxy: 1,
    JSON: 1,
    "Map Iterator": 1,
    Math: 1,
    ModuleLoader: 1,
    ModuleProgramExecutable: 1,
    ModulePrototype: 1,
    NativeBrotli: 1,
    NativeZlib: 1,
    NextTickQueue: 1,
    Number: 1,
    ProcessBindingConstants: 1,
    Process: 1,
    ProgramExecutable: 1,
    Prototype: 1,
    ProxyObject: 1,
    Proxy: 1,
    Reflect: 1,
    "RegExp String Iterator": 1,
    Request: 1,
    ResolveMessage: 1,
    Response: 1,
    "Set Iterator": 1,
    ShadowRealm: 1,
    Stats: 1,
    "String Iterator": 1,
    String: 1,
    Symbol: 1,
    TextDecoder: 1,
    TextEncoderStreamEncoder: 1,
    Uint16ArrayPrototype: 1,
    Uint32ArrayPrototype: 1,
    Uint32Array: 1,
    Uint8ArrayPrototype: 1,
    Uint8ClampedArrayPrototype: 1,
    UnlinkedModuleProgramCodeBlock: 1,
    UnlinkedProgramCodeBlock: 1,
    WeakRef: 1,
    WeakSet: 1,
    WebAssembly: 1,
    console: 1,
    require: 1,
    resolve: 1,
  },
  protectedObjectTypeCounts: {
    Function: 23,
    UnlinkedFunctionExecutable: 4,
    DebugHTTPServer: 1,
    GlobalObject: 1,
    Promise: 1,
    Timeout: 1,
    UnlinkedModuleProgramCodeBlock: 1,
  },
  heapSize: 7729988,
  heapCapacity: 10498564,
  extraMemorySize: 2840836,
  objectCount: 70244,
  protectedObjectCount: 32,
  globalObjectCount: 1,
  protectedGlobalObjectCount: 1,
}
DoKoB0512 commented 2 weeks ago

The other thing that I've noticed is that the Bun's GC is kind of hit or miss sometimes. If we try to trigger it right after the test, it does not free up the memory like nodejs does.