treeverse / lakeFS

lakeFS - Data version control for your data lake | Git for data
https://docs.lakefs.io
Apache License 2.0
4.46k stars 359 forks source link

Fix OOM lakectl FS upload bug #8349

Closed idanovo closed 2 weeks ago

idanovo commented 2 weeks ago

Closes #8088

github-actions[bot] commented 2 weeks ago

E2E Test Results - DynamoDB Local - Local Block Adapter

13 passed
github-actions[bot] commented 2 weeks ago

E2E Test Results - Quickstart

11 passed
idanovo commented 2 weeks ago

@itaiad200 Yes, follow me-

From the uploadObject/UploadPartObject we have this code:

    req, err := http.NewRequestWithContext(ctx, http.MethodPut, preSignURL, body)
        ...
    putResp, err := u.uploader.HTTPClient.Do(req)

the type of the body we pass to NewRequestWithContext is local.fileWrapper which implement seek but doesn't implement close so the http.NewRequestFromContext function get to this code:

    rc, ok := body.(io.ReadCloser)
    if !ok && body != nil {
        rc = io.NopCloser(body)
    }

So we get a request from body that implements close (NopCloser) but doesn't implement seek

Now when getting to this line- putResp, err := u.uploader.HTTPClient.Do(req) If our HTTP client is the retryable client the Do function will eventually get to this code:

    case io.ReadSeeker:
         ...
    // Read all in so we can reset
    case io.Reader:
        buf, err := io.ReadAll(body)

Now, because our body doesn't implement seek but does implement read, we will call io.ReadAll and read the entire file into the memory and crush.

idanovo commented 2 weeks ago

Thanks for the clear explanation. How was this tested?

@itaiad200 I built a docker image that runs lakectl fs upload and tries to upload a 1GB file using the --memory=512m flag. First I saw that the bug reproduced when running it using the master branch, and then tested that my code made it work.

arielshaqed commented 2 weeks ago

https://imgflip.com/i/99kzey Corgi meme: Wow / much memory / very progress / close() / very fix