lovell / sharp

High performance Node.js image processing, the fastest module to resize JPEG, PNG, WebP, AVIF and TIFF images. Uses the libvips library.
https://sharp.pixelplumbing.com
Apache License 2.0
29.01k stars 1.29k forks source link

Memory leak on Linux #1803

Closed Cow258 closed 5 years ago

Cow258 commented 5 years ago

What is the output of running npx envinfo --binaries --languages --system --utilities?

System: OS: Linux 4.15 Ubuntu 18.04.2 LTS (Bionic Beaver) CPU: (1) x64 Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz Memory: 470.87 MB / 983.91 MB Container: Yes Shell: 4.4.20 - /bin/bash

Binaries: Node: 12.6.0 - /usr/bin/node Yarn: 1.15.2 - /usr/bin/yarn npm: 6.9.0 - /usr/bin/npm

Utilities: Git: 2.17.1 - /usr/bin/git

Languages: Bash: 4.4.20 - /bin/bash Perl: 5.26.1 - /usr/bin/perl Python: 2.7.15+ - /usr/bin/python

Instance Infomation AWS t2.micro (1vCPU, 1GB Ram) Ubuntu 18.04 LTS Linux

What are the steps to reproduce?

All of the following code do memory leak

Using Stream

await (new Promise(async (resolve, reject) => {
    try {
        let sharp = require('sharp');
        let maxPx = 2500;
        let rs = fs.createReadStream(oldPath);
        let ws = fs.createWriteStream(newPath);
        let transformer = sharp()
            .resize(maxPx, maxPx, { fit: sharp.fit.inside, withoutEnlargement: true })
            .rotate()
            .jpeg({ quality: 75 });
        ws.on('finish', () => { sharp = null; rs = null; ws = null; resolve(true); });
        ws.on('error', reject);
        rs.pipe(transformer).pipe(ws);
    } catch (error) { reject(error); }
}));
await Promise.all([
    (async () => { 
        let sharp = require('sharp');
        let info = await sharp(newPath).metadata();
        fileW = info.width;
        fileH = info.height;
    })(),
    (async () => { 
        let stat = await xfs.stat(newPath);
        fileSize = stat.size;
    })()
]);

Using Buffer to Buffer

let maxPx = 2500;
let buff = await xfs.readFile(oldPath);
let { data, info } = await sharp(buff)
    .resize(maxPx, maxPx, { fit: sharp.fit.inside, withoutEnlargement: true })
    .rotate()
    .jpeg({ quality: 75 })
    .toBuffer({ resolveWithObject: true });
await xfs.writeFile(newPath, data);
fileSize = info.size;
fileW = info.width;
fileH = info.height;

Using File to Buffer

let maxPx = 2500;
let { data, info } = await sharp(oldPath)
    .resize(maxPx, maxPx, { fit: sharp.fit.inside, withoutEnlargement: true })
    .rotate()
    .jpeg({ quality: 75 })
    .toBuffer({ resolveWithObject: true });
await xfs.writeFile(newPath, data);
fileSize = info.size;
fileW = info.width;
fileH = info.height;
data = null; info = null;

Using File to File

let maxPx = 2500;
await sharp(oldPath)
    .resize(maxPx, maxPx, { fit: sharp.fit.inside, withoutEnlargement: true })
    .rotate()
    .jpeg({ quality: 75 })
    .toFile(newPath);
let info = await sharp(newPath).metadata();
let stat = await xfs.stat(newPath);
fileSize = stat.size;
fileW = info.width;
fileH = info.height;

About the issue

The larger the uploaded file, the more serious the memory leak. When i upload 20 images and 5mb for each images, memory usage come to 400MB-500MB.

When I do process.memoryUsage(), RSS about 350-450MB and not effect to global.gc(); or sharp.cache(false);.

This issue happen on Node.js version 8,10,11,12. I tried on each Node.js version, but the issue still here.

I tried global.gc(); and sharp.cache(false); and sharp.concurrency(1);, also tried LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1, but the issue still here.

This issue will be more important when server not restart for long time. I tried all above of the codes and no issue on Windows and MacOS. The memory usage keep increase after time and after upload images on Linux.

I tried switch to gm.js and no memory leak issue, but gm.js is too slower then sharp.js.

Please help me to resolve this issue. Thank you.

Here are full code

const xSQL = require('../lib/xSQL');
const xIMG = require('../lib/xIMG');
const xS3 = require('../lib/xS3');
const xCm = require('../lib/xCm');
const formidable = require('formidable');
const extend = require('extend');
const util = require('util');
const fs = require('fs');
const xfs = require('async-file');
const sharp = require('sharp');
const gm = require('gm');
const path = require('path');
// sharp.concurrency(1);
// sharp.cache(false);
// sharp.cache({ files: 2 });
// sharp.simd(false);

var fileIndex = Date.now();

async function xForm(req, res, next) {
    if ( /^application\/json/.test(req.headers['content-type'])) { next(); return true; }

    let result = { file: {} };

    if (!(await xfs.exists(__base + 'upload'))) await xfs.mkdir(__base + 'upload');
    if (!(await xfs.exists(__base + 'upload/temp'))) await xfs.mkdir(__base + 'upload/temp');

    let post = {};
    let form = new formidable.IncomingForm();
    form.encoding = 'utf-8';
    form.uploadDir = 'upload';
    form.keepExtensions = true;
    form.maxFields = 0;
    form.maxFieldsSize = 50 * 1024 * 1024;

    form.on('field', function (field, value) { 
        if (form.type == 'multipart') {  
            if (field in post) { 
                if (util.isArray(post[field]) === false)
                    post[field] = [post[field]];
                post[field].push(value);
                return true;
            }
        }
        post[field] = value;
    });
    form.on('file', async function (name, file) {
        try {
            if (file.size == 0) { await xfs.unlink(__base + file.path); return true; }
        } catch (err) { console.error('xForm ERROR! => ', err); }
    });
    form.parse(req, async function (err, fields, files) {
        if (err) { console.error('xForm ERROR! => ', err); }
        if (files) {
            for (let name in files) {
                let file = files[name];
                if (file.size == 0) continue;
                // 解析正確的副檔名
                let extName = '';
                switch (file.type) {
                    case 'image/pjpeg':
                    case 'image/jpeg':
                        extName = 'jpg';
                        break;
                    case 'image/png':
                    case 'image/x-png':
                        extName = 'jpg';
                        break;
                    case 'application/pdf':
                        extName = 'pdf';
                        break;
                }
                // 定義新的隨機檔案名稱
                fileIndex++;
                let hash = xCm.md5(fileIndex.toString());
                let fileName = `${hash}.${extName}`;
                let oldPath = path.resolve(file.path);
                let newPath = path.resolve(`./upload/${fileName}`);
                let fileSize = 0;
                let fileW = 0;
                let fileH = 0;
                // 檔案檢查            
                if (extName.length == 0) { await ErrorHandle('只支持 pdf 、 png 和 jpg 格式'); continue; }
                if (file.size > (50 * 1024 * 1024)) { await ErrorHandle('只支持少於 50 MB 的圖片檔案'); continue; }

                if (extName == 'pdf') {
                    try {
                        await xfs.rename(oldPath, newPath);

                        let SQL = new xSQL(); let r = null;
                        r = await SQL.xAdd('attinfo', { 'name': file.name, 'dir': fileName, 'type': 0, size: file.size });
                        if (r.err) { console.log(r.err); throw r.err; }
                        do { r = await SQL.xRead('attinfo', 'dir', fileName); }
                        while (r.eof);
                        result.files[name] = r.row[0];

                    } catch (error) { await ErrorHandle(error); continue; }
                } else {
                    let maxPx = 2500;
                    let buff = await xfs.readFile(oldPath);
                    let { data, info } = await sharp(buff)
                        .resize(maxPx, maxPx, { fit: sharp.fit.inside, withoutEnlargement: true })
                        .rotate()
                        .jpeg({ quality: 75 })
                        .toBuffer({ resolveWithObject: true });
                    await xfs.writeFile(newPath, data);
                    fileSize = info.size;
                    fileW = info.width;
                    fileH = info.height;
                    data = null; info = null;
                    buff.length = 0;
                    buff = null;
                }

                let s3 = new xS3();
                try { await s3.Upload(fileName, newPath); }
                catch (error) { await ErrorHandle(error); continue; }
                finally { s3 = null; }

                try { await xfs.unlink(oldPath); }
                catch (error) { console.log(error); }

                let SQL = new xSQL();
                let r = await SQL.xAdd('attinfo', {
                    name: file.name,
                    dir: fileName,
                    type: 0,
                    w: fileW,
                    h: fileH,
                    size: fileSize
                });
                if (r.err) { await ErrorHandle(r.err); continue; }
                result.file = {
                    err: false,
                    errMsg: '',
                    name: file.name,
                    num: r.insertId,
                    dir: fileName
                };

                async function ErrorHandle(err) {
                    try { await xfs.unlink(oldPath); }
                    catch (error) { console.error('xForm ERROR! => ', error, err); }
                    finally {
                        console.error('xForm ERROR! => ', err);
                        result.file.err = true;
                        result.file.errMsg = err;
                    }
                }
            }
        }
        Complete();
    });

    function Complete() {
        req.body = post;
        req.file = result.file;
        next();
    }

}

module.exports = xForm;
lovell commented 5 years ago

Hi, did you see and read all of the comments on #955?

Cow258 commented 5 years ago

I read all about 20-30 times, i tried everything, but the issue still here

Cow258 commented 5 years ago

image Easy to reach 500mb by uploading 16 images and each 6MB

Cow258 commented 5 years ago

Screenshot_20190723_184817_com android chrome Screenshot_20190723_184827_com android documentsui

Cow258 commented 5 years ago
螢幕截圖 2019-07-23 18 55 22
Cow258 commented 5 years ago

pm2 config

module.exports = {
    apps: [{
        name          : 'novel',
        script        : './app.js',
        watch         : false,
        //node_args     : '--max_semi_space_size=2 --max_old_space_size=128',
        env           : { 'LD_PRELOAD': '/usr/lib/x86_64-linux-gnu/libjemalloc.so.1', 'NODE_ENV': 'production' },
        env_production: { 'LD_PRELOAD': '/usr/lib/x86_64-linux-gnu/libjemalloc.so.1', 'NODE_ENV': 'production' },
        wait_ready    : true,
        listen_timeout: 2000,
        kill_timeout  : 2000,
        exec_mode     : 'cluster',
        instances     : 1
    }]
};
Cow258 commented 5 years ago

I just tried #1041 LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 node app.js Memory usage below 150mb and no any memory leak issue.

But when using pm2 to start app.js, the memory leak issue still here

lovell commented 5 years ago

The LD_PRELOAD environment variable must be set before the Node process is started, so probably needs to be configured long before pm2 is involved.

Remember: this is not a memory leak, you are seeing the effects of freed, possibly fragmented memory being held by some memory allocators rather than returned to the OS.

Cow258 commented 5 years ago

I tried LD_PRELOAD before pm2 since system boot up, but issue still here. Maybe issue on pm2. Because I tried LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 node app.js Memory usage below 150mb and no any memory issue. Any way, I'm sure LD_PRELOAD environment variable was set before the Node process is started.

lovell commented 5 years ago

Thanks, it sounds like you'll need to ask the pm2 maintainers about this - good luck!

Cow258 commented 5 years ago

Finally, I found out the problem! This issue only issue on pm2 cluster mode. pm2 fork mode able to load libjemalloc correctly but cluster.

Some how the pm2 cluster mode unable to load libjemalloc correctly.

xiaoblack163 commented 4 years ago

@Cow258 Add 'max_memory_restart' to limit memory?

Cow258 commented 4 years ago

@Cow258 Add 'max_memory_restart' to limit memory?

if restart when processing image that cause data lose. so far, using pm2 fork mode will not memory leak

crobinson42 commented 4 years ago

@Cow258 Thanks for documenting your investigation, this has helped me with a similar concern. I'm using pm2 in cluster mode and I notice memory start low and climb up to the max available. The frustrating part is not being able to use cluster mode with pm2 in this situation - this only appears to be related to using this sharp package on a Node.js instance/cluster though.

crobinson42 commented 4 years ago

@Cow258 also - the way the docs for pm2 explain max_memory_restart option when in cluster mode is that it's graceful reload so you shouldn't have data loss - I'd be curious to know if you experienced data loss when using this option.https://pm2.keymetrics.io/docs/usage/memory-limit/

Cow258 commented 4 years ago

@Cow258 also - the way the docs for pm2 explain max_memory_restart option when in cluster mode is that it's graceful reload so you shouldn't have data loss - I'd be curious to know if you experienced data loss when using this option.https://pm2.keymetrics.io/docs/usage/memory-limit/

You are correct, graceful reload will not have data loss, but if user uploading 100+ images to your server, this may cause the server to fail to restart in time and then run out of memory and then the server will hang out.

I'm using AWS free tier and transfer data to new account each year, so i got a EC2 with 1CPU and 1GB Ram, that mean i only have limited memory. Also it is running about 3 web server of Node.js with using Sharp.js. Now all server is running on PM2 fork mode without any memory leak problem.

Screenshot_20200109_091150

Here are screenshot. After 32 days up time, it still below 150MB of RAM usage.

Finally, I will keep asking to PM2 maintainers about this.

Cow258 commented 4 years ago

Some update for this issue

https://github.com/Unitech/pm2/issues/4375#issuecomment-652366370

We can use Cluster mode with LD_PRELOAD now!!!\ Required to set LD_PRELOAD environment variable before pm2 start,\ So...\ Add LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1\ Into /env/environment\ Also do\ export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1\ Then\ sudo reboot

The cluster mode will working with LD_PRELOAD=xxxx,\ Not issue when processing many big size images with sharp.js.

Cow258 commented 3 years ago

Update

https://github.com/Unitech/pm2/issues/4375#issuecomment-898658080

Some how I got memory leak again this few week. So I switched to jemalloc 5.2.1 to fix memory leak issue. Here are some tutorial

Install latest jemalloc 5.2.1 on Ubuntu 18

Refer: https://gist.github.com/diginfo/be7347e6e6c4f05375c51bca90f220e8

sudo apt-get -y install autoconf libxslt-dev xsltproc docbook-xsl
git clone https://github.com/jemalloc/jemalloc.git
cd jemalloc
autoconf
./configure
make dist
sudo make install

You can found jemalloc at /usr/local/lib/libjemalloc.so.2

sudo su
vi /etc/enviroment

# Then add this line
LD_PRELOAD=/usr/local/lib/libjemalloc.so.2

Then

export LD_PRELOAD=/usr/local/lib/libjemalloc.so.2

Then optional reboot sudo reboot

lovell commented 3 years ago

A reminder to those visiting this page, as I see a number of recently-linked issues: this is not a memory leak, you are seeing the effects of freed, possibly fragmented memory being held by some memory allocators rather than returned to the OS.

Make sure you're using the latest version of sharp (v0.29.0 as of writing), which will determine the memory allocator is in use and adjust concurrency accordingly.

https://sharp.pixelplumbing.com/install#linux-memory-allocator

Cow258 commented 2 years ago

Update

I faced this problem again in a week, and it took me 2 days to solve it After seeing this comment, I got the answer https://github.com/nodejs/help/issues/1518#issuecomment-997221787

Somehow, the LD_PRELOAD environment variable may not work You need to modify /etc/ld.so.preload to solve this problem

# make install version
sudo echo "/usr/local/lib/libjemalloc.so.2" >> /etc/ld.so.preload

# apt-get version
sudo echo "/usr/lib/x86_64-linux-gnu/libjemalloc.so" >> /etc/ld.so.preload

Then restart all node processes to use jemalloc for allocation:

# If you are using pm2
pm2 kill
pm2 resurrect

Then check the PID of your running node process and plug it into the command below to verify it is using jemalloc:

ps aux | grep node
cat /proc/<PID>/smaps | grep jemalloc
anhiao commented 1 year ago

Update

I faced this problem again in a week, and it took me 2 days to solve it After seeing this comment, I got the answer nodejs/help#1518 (comment)

Somehow, the LD_PRELOAD environment variable may not work You need to modify /etc/ld.so.preload to solve this problem

# make install version
sudo echo "/usr/local/lib/libjemalloc.so.2" >> /etc/ld.so.preload

# apt-get version
sudo echo "/usr/lib/x86_64-linux-gnu/libjemalloc.so" >> /etc/ld.so.preload

Then restart all node processes to use jemalloc for allocation:

# If you are using pm2
pm2 kill
pm2 resurrect

Then check the PID of your running node process and plug it into the command below to verify it is using jemalloc:

ps aux | grep node
cat /proc/<PID>/smaps | grep jemalloc

Thank you for the solution, but I still have to warn people here, this solution may cause errors used by other frameworks, when I configure it and to use ”puppeteer“, it reports the error "ProtocolError: Protocol error (Page.navigate): Target closed.", in the process of troubleshooting, I found that it is in conflict with the current solution

lovell commented 1 year ago

@anhiao Thanks for the warning, this problem with puppeteer is unrelated to sharp, and probably also unrelated to jemalloc.

As an aside, there is plenty of discussion and possible solutions on the puppeteer repo e.g. https://github.com/puppeteer/puppeteer/issues/1947 - Make sure you're providing enough RAM for peak memory usage, which may be higher or lower under a different allocator, as well as understanding its configuration relating to memory e.g. --disable-dev-shm-usage. Further questions about puppeteer should be directed to the puppeteer repo.