Open andrijdavid opened 2 months ago
The upload mechanism for both pre-signed URLs and direct uploads to the client buffers data in memory, which is not ideal for large files and triggers OOM when uploading big files.
We also got "lakectl" killed by the local host kernel, because it was trying to use more memory than was available (not using the "-p" option).
On a computer with 32GB ram (with 15GB already taken by other processes), we were finally able to commit 7.45GB binary files with "-p 1".
We think we will not be able to ingest larger binary files.
It seems that for binary files of 7GB, lakectl needs a little more than 2x the memory space of the large binary files per concurrent process requested (if do not specify "-p" the default seems to be 25).
ex.: if p=8, and the folder contains all 10GB binary files, we should expect "lakectl" requiring 8x10x2 = 160GB of RAM to avoid being killed when trying to upload (commit) the folder.
Is that right ? Are there options, or plans to allow ingestion of large binary files (larger files than the computer ram) ?
Hi @andrijdavid,
A couple of questions: 1) What's the max size of each object in the directories you are trying to upload? 2) Do you get the same error when uploading a single file? 3) Do you get the same error when running with --pre-sign=false? 4) What OS do you use?
Sorry to bother you, but I want to understand the exact issue you faced, as there are many options.
@andrijdavid @dvnicolasdh Thanks for reporting this issue, I think we found the issue; It's related to a bug in the [go-retryablehttp package we use. It reads files into the memory instead of streaming it.
As a temporary workaround, till we release a new version with a fix, you can set lakect not to use the retryable client by: 1) Adding this to your lakectl.yaml file
server:
retries:
enabled: false
Or
2) Running lakectl with this env var LAKECTL_SERVER_RETRIES_ENABLED=false
Can you please try this and let me know if it solved your issue?
What happened?
lakectl fs upload
command causes an Out-Of-Memory (OOM) error, resulting in the process being killed by the kernel or freezing the OS during the upload of large files.Environment:
Steps to Reproduce:
lakectl fs upload --source . --recursive "lakefs://${LAKEFS_REPO_NAME}/${DEFAULT_BRANCH}/" --pre-sign -p 8
Reducing p doesn't solve the issueExpected behavior
File uploaded successfully
lakeFS version
1.31.1
How lakeFS is installed
GCP
Affected clients
All
Relevant log output
Contact details
No response