libvips / pyvips

python binding for libvips using cffi
MIT License
648 stars 49 forks source link

Memory leak when use pyvips decode heic image #298

Open Mhuang77 opened 2 years ago

Mhuang77 commented 2 years ago

HEIC image case (github not support heic file use zip to pack)

mmo_killer.zip

Python code

def pyvips_decoder(image_data):
    buf_in = pyvips.Image.new_from_buffer(image_data, "", access="sequential")
    return buf_in.write_to_buffer(".jpeg", Q=95)

Reproduce issue

docker run  with fixed memory,  like 300M
docker run  -m 300m
after multiple invoke pyvips_decoder function then will OOM Killed

Image Dockerfile

FROM ubuntu:focal
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update
RUN apt-get install -y \
  software-properties-common \
  build-essential \
  autoconf \
  automake \
  cmake \
  libtool \
  nasm \
  unzip \
  wget \
  git \
  pkg-config \
  gtk-doc-tools \
  gobject-introspection \
    less \
    vim

# base lib
RUN apt-get install -y \
  glib-2.0-dev \
  libheif-dev \
  libexpat-dev \
  librsvg2-dev \
  libpng-dev \
  libgif-dev \
  libjpeg-dev \
  libtiff-dev \
  libexif-dev \
  liblcms2-dev \
  liborc-dev \
  python3-opencv

# openjpeg
ARG OPENJPEG_VERSION=2.4.0
ARG OPENJPEG_URL=https://github.com/uclouvain/openjpeg/archive

RUN wget ${OPENJPEG_URL}/v${OPENJPEG_VERSION}.tar.gz \
  && tar xf v${OPENJPEG_VERSION}.tar.gz \
  && cd openjpeg-${OPENJPEG_VERSION} \
  && mkdir build \
  && cd build \
  && cmake .. \
  && make \
  && make install
# vipslib
ARG VIPS_VERSION=8.11.2
ARG VIPS_URL=https://github.com/libvips/libvips/releases/download

WORKDIR /usr/local/src

RUN wget ${VIPS_URL}/v${VIPS_VERSION}/vips-${VIPS_VERSION}.tar.gz \
    && tar xzf vips-${VIPS_VERSION}.tar.gz \
    && cd vips-${VIPS_VERSION} \
    && ./configure \
    && make V=0 \
    && make install \
    && ldconfig

# pyvips Pillow opencv-python
RUN apt-get install -y \
  python3-pip
RUN pip3 install pyvips Pillow opencv-python
jcupitt commented 2 years ago

Hi @Mhuang77,

I tried this test program:

#!/usr/bin/python3

import os
import psutil
import sys

import pyvips

def pyvips_decoder(image_data):
    buf_in = pyvips.Image.new_from_buffer(image_data, "", access="sequential")
    return buf_in.write_to_buffer(".jpeg", Q=95)

image_data = open(sys.argv[1], "rb").read()

process = psutil.Process(os.getpid())
print(f"iteration, process size (MB)")
for i in range(10000):
    print(f"{i}, {process.memory_info().rss / (1024 * 1024):.2f}")
    new_image = pyvips_decoder(image_data)

I see:

$ ./buffer-loadsave.py ~/pics/mmo_killer.heic 
iteration, process size (MB)
0, 40.72
1, 101.52
2, 154.52
3, 198.06
4, 233.20
...

So memory use stabilises after a while. I graphed the CSV:

plot

So I think you're seeing an interaction of memory fragmentation plus the python GC. I would look at other malloc implementations, such as jemalloc.

jcupitt commented 2 years ago

Here's another test program:

/* Compile with: 
 *      gcc -g -Wall buffer.c `pkg-config vips --cflags --libs`
 */

#include <vips/vips.h>

int
main( int argc, char **argv )
{
        gchar *buf;
        gsize len;
        int i;

        if( VIPS_INIT( argv[0] ) )
                vips_error_exit( NULL );

        if( !g_file_get_contents( argv[1], &buf, &len, NULL ) )
                vips_error_exit( NULL );

        for( i = 0; i < 10; i++ ) {
                VipsImage *image;
                void *new_buf;
                size_t new_len;

                printf( "loop %d ...\n", i );

                if( !(image = vips_image_new_from_buffer( buf, len, "",
                        "access", VIPS_ACCESS_SEQUENTIAL,
                        NULL )) )
                        vips_error_exit( NULL );

                if( vips_image_write_to_buffer( image, 
                        ".jpg", &new_buf, &new_len,
                        "Q", 95,
                        NULL ) ) 
                        vips_error_exit( NULL );

                g_object_unref( image );
                g_free( new_buf );
        }

        g_free( buf );

        vips_shutdown();

        return( 0 );
}

I ran like this:

$ gcc -g -Wall buffer.c `pkg-config vips --cflags --libs`
$ valgrind --leak-check=yes ./a.out ~/pics/mmo_killer.heic 
==1444545== Memcheck, a memory error detector
==1444545== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1444545== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==1444545== Command: ./a.out /home/john/pics/mmo_killer.heic
==1444545== 
loop 0 ...
loop 1 ...
loop 2 ...
loop 3 ...
...
loop 9 ...
... [snip stuff around thread pool creation]
==1444545== 
==1444545== LEAK SUMMARY:
==1444545==    definitely lost: 0 bytes in 0 blocks
==1444545==    indirectly lost: 0 bytes in 0 blocks
==1444545==      possibly lost: 15,840 bytes in 33 blocks
==1444545==    still reachable: 191,971 bytes in 2,274 blocks
==1444545==         suppressed: 576,779 bytes in 8,293 blocks
==1444545== Reachable blocks (those to which a pointer was found) are not shown.
==1444545== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==1444545== 
==1444545== For lists of detected and suppressed errors, rerun with: -s
==1444545== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)

So again, no leak detected.

jcupitt commented 2 years ago

Final test prog -- I tried the python version with the operation cache disabled:

#!/usr/bin/python3

import os
import psutil
import sys

import pyvips

def pyvips_decoder(image_data):
    buf_in = pyvips.Image.new_from_buffer(image_data, "", access="sequential")
    return buf_in.write_to_buffer(".jpeg", Q=95)

image_data = open(sys.argv[1], "rb").read()

pyvips.cache_set_max(0)

process = psutil.Process(os.getpid())
print(f"iteration, process size (MB)")
for i in range(10000):
    print(f"{i}, {process.memory_info().rss / (1024 * 1024):.2f}")
    new_image = pyvips_decoder(image_data)

And I see:

x

You can see it stabilises much more quickly, and to a lower level.