python-pillow / Pillow

Python Imaging Library (Fork)
https://python-pillow.org
Other
12.3k stars 2.23k forks source link

Segfault on Alpine Linux Docker #1935

Closed blaggacao closed 7 years ago

blaggacao commented 8 years ago

What did you do?

Pillow is embedded into the Odoo library. I was trying to load a data file which referenced a png, when suddenly a segfault terminated my container.

What did you expect to happen?

The processing of the png would complete successfully.

What actually happened?

Putting on DEBUG mode, we can see the following output.

odoo    | 2016-05-30 08:29:20,832 7 INFO test openerp.modules.loading: loading base/base_data.xml
odoo    | 2016-05-30 08:29:21,050 7 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo    | 2016-05-30 08:29:21,052 7 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo    | 2016-05-30 08:29:21,052 7 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo    | 2016-05-30 08:29:21,053 7 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo    | 2016-05-30 08:29:21,053 7 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo    | 2016-05-30 08:29:21,054 7 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo    | 2016-05-30 08:29:21,054 7 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo    | 2016-05-30 08:29:21,087 7 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo    | 2016-05-30 08:29:21,088 7 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo    | 2016-05-30 08:29:21,088 7 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo    | 2016-05-30 08:29:21,088 7 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo    | 2016-05-30 08:29:21,093 7 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo    | 2016-05-30 08:29:21,094 7 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo    | 2016-05-30 08:29:21,094 7 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo    | 2016-05-30 08:29:21,275 7 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo    | 2016-05-30 08:29:21,276 7 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo    | 2016-05-30 08:29:21,276 7 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo    | 2016-05-30 08:29:21,276 7 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo    | 2016-05-30 08:29:21,277 7 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo    | 2016-05-30 08:29:21,277 7 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo    | 2016-05-30 08:29:21,278 7 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo    | 2016-05-30 08:29:21,338 7 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo    | 2016-05-30 08:29:21,338 7 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo    | 2016-05-30 08:29:21,339 7 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo    | 2016-05-30 08:29:21,339 7 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo    | 2016-05-30 08:29:21,339 7 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo    | 2016-05-30 08:29:21,340 7 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo    | 2016-05-30 08:29:21,340 7 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo    | 2016-05-30 08:29:21,384 7 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo    | 2016-05-30 08:29:21,385 7 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo    | 2016-05-30 08:29:21,385 7 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo    | 2016-05-30 08:29:21,385 7 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo    | 2016-05-30 08:29:21,386 7 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo    | 2016-05-30 08:29:21,386 7 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo    | 2016-05-30 08:29:21,386 7 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo    | 2016-05-30 08:29:21,563 7 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo    | 2016-05-30 08:29:21,563 7 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo    | 2016-05-30 08:29:21,564 7 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo    | 2016-05-30 08:29:21,564 7 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo    | 2016-05-30 08:29:21,564 7 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo    | 2016-05-30 08:29:21,564 7 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo    | 2016-05-30 08:29:21,565 7 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo    | Segmentation fault

What versions of Pillow and Python are you using?

Pillow 3.2.0 and (probably through) 2.7.0 Python 2.7.11 Alpine Linux 3.3

blaggacao commented 8 years ago

After resuming the container, we can see also resuming the loading with the following output:

odoo    | 2016-05-30 08:34:52,329 6 INFO test openerp.modules.loading: loading base/base_data.xml
odoo    | 2016-05-30 08:34:52,618 6 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo    | 2016-05-30 08:34:52,619 6 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo    | 2016-05-30 08:34:52,621 6 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo    | 2016-05-30 08:34:52,621 6 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo    | 2016-05-30 08:34:52,622 6 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo    | 2016-05-30 08:34:52,623 6 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo    | 2016-05-30 08:34:52,624 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo    | 2016-05-30 08:34:52,659 6 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo    | 2016-05-30 08:34:52,661 6 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo    | 2016-05-30 08:34:52,662 6 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo    | 2016-05-30 08:34:52,665 6 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo    | 2016-05-30 08:34:52,670 6 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo    | 2016-05-30 08:34:52,674 6 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo    | 2016-05-30 08:34:52,675 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo    | 2016-05-30 08:34:52,857 6 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo    | 2016-05-30 08:34:52,857 6 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo    | 2016-05-30 08:34:52,857 6 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo    | 2016-05-30 08:34:52,858 6 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo    | 2016-05-30 08:34:52,858 6 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo    | 2016-05-30 08:34:52,866 6 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo    | 2016-05-30 08:34:52,867 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo    | 2016-05-30 08:34:52,937 6 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo    | 2016-05-30 08:34:52,938 6 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo    | 2016-05-30 08:34:52,940 6 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo    | 2016-05-30 08:34:52,945 6 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo    | 2016-05-30 08:34:52,946 6 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo    | 2016-05-30 08:34:52,946 6 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo    | 2016-05-30 08:34:52,947 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo    | 2016-05-30 08:34:52,984 6 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo    | 2016-05-30 08:34:52,985 6 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo    | 2016-05-30 08:34:52,986 6 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo    | 2016-05-30 08:34:52,991 6 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo    | 2016-05-30 08:34:52,991 6 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo    | 2016-05-30 08:34:52,992 6 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo    | 2016-05-30 08:34:52,992 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo    | 2016-05-30 08:34:53,242 6 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo    | 2016-05-30 08:34:53,242 6 DEBUG test PIL.PngImagePlugin: STREAM bKGD 41 6
odoo    | 2016-05-30 08:34:53,243 6 DEBUG test PIL.PngImagePlugin: bKGD 41 6 (unknown)
odoo    | 2016-05-30 08:34:53,245 6 DEBUG test PIL.PngImagePlugin: STREAM pHYs 59 9
odoo    | 2016-05-30 08:34:53,246 6 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
odoo    | 2016-05-30 08:34:53,247 6 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
odoo    | 2016-05-30 08:34:53,247 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
odoo    | 2016-05-30 08:34:53,652 6 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo    | 2016-05-30 08:34:53,653 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 41 3136
odoo    | 2016-05-30 08:34:53,680 6 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo    | 2016-05-30 08:34:53,681 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 41 3136
odoo    | 2016-05-30 08:34:53,836 6 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo    | 2016-05-30 08:34:53,836 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 41 3136
odoo    | 2016-05-30 08:34:53,858 6 DEBUG test PIL.PngImagePlugin: STREAM IHDR 16 13
odoo    | 2016-05-30 08:34:53,858 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 41 3136
wiredfool commented 8 years ago

What were you doing with the pillow library, and what file were you using?

blaggacao commented 8 years ago

Hi Eric, I'm still trying to narrow the problem down, as I'm really not even sure, if this is the right repo down the stack... Any hint's you can probably give me to help hunting this down, I would very much appreciate...

main_partner-image public_user-image

both seem fail

blaggacao commented 8 years ago

On a normal ubuntu install, it works well, however, alpine is kind of the standard in docker security thanks to pax/Grsecurity...

wiredfool commented 8 years ago

Segfaults are actually pretty easy to pin down, more or less.

1) Run it under gdb. e.g.:

gdb python
...
r my_test_.py
[things go boom]
bt
[big long stack trace where things went wrong]

Fedora is the best out of the box for gdb, although there are macros that you can put in for other linuicies. That can give you a python level back trace from within gdb, as well as being able to inspect python objects. The unified view is really useful.

If you can, it helps to run a version of python with debug symbols, and compile with optimizations off. (on ubuntu, it's by installing python-dbg and compiling and running pillow against that)

2) Once you have a backtrace of any form, you'll at least be able to trace down which c extension it's from. Hopefully, you can narrow it down and figure out which image is causing the problem, then we can have a very simple test case to dive into and figure out what's going wrong.

blaggacao commented 8 years ago

Hm, I wasn't able yet to get python with debug symbols, as I think running on Alpine Linux is relevant. Yet, the gdb output is not quite useful:

2016-05-30 10:58:16,468 6 DEBUG test PIL.PngImagePlugin: STREAM tIME 80 7
2016-05-30 10:58:16,468 6 DEBUG test PIL.PngImagePlugin: tIME 80 7 (unknown)
2016-05-30 10:58:16,470 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
During startup program terminated with signal SIGSEGV, Segmentation fault.
(gdb)
(gdb) bt
No stack.
(gdb)
wiredfool commented 8 years ago

That's odd. If you're interpreting a png, then you should be well through the startup.

Try strace? It's going to have a ton of output, but you should be able to tell what's tried to execute and fail there.

blaggacao commented 8 years ago

Neither...

2016-05-30 10:58:16,470 6 DEBUG test PIL.PngImagePlugin: STREAM IDAT 99 7809
During startup program terminated with signal SIGSEGV, Segmentation fault.
(gdb)
(gdb) bt
No stack.
(gdb) strace
No default breakpoint address now.
blaggacao commented 8 years ago

Can this be related to PaX ? https://en.wikibooks.org/wiki/Grsecurity/Appendix/PaX_Flags eg.: https://pax.grsecurity.net/docs/mprotect.txt

I've seen paxctl -cm in order to disable mprotect on some nodejs dockers beeing in connection with segfaults...

I already tried paxctl -cm on some libraries (all of PIL/*.so and /lib/zlib.so) to no avail.

wiredfool commented 8 years ago

There's nothing in Pillow that would need executable memory maps, and at any rate, png decoding would not trigger memory mapping in our code. That's not to say that somewhere deep inside python the image file isn't getting memory mapped in, but it's not something that we request. (and even at that, it wouldn't require execute privs).

wiredfool commented 8 years ago

To narrow this down, you could build just enough of a docker image to run pillow, and attempt to open and load the images in question. If they succeed, then you at least know that it's somewhere else. And if they fail, it's going to be a smaller test case.

blaggacao commented 8 years ago

In fact I had already done so, but I couldn't really draw a conclusion...

FROM alpine
USER root

# Part of my stack, no idea if maybe relevant.
RUN set -x \
    && apk add --no-cache su-exec \
    && chmod 500 $(which su-exec)

RUN apk --no-cache add python \
                       build-base \
                       python-dev \
                       py-pip \
                       # Pillow image libraries
                       jpeg-dev \
                       zlib-dev 
RUN LIBRARY_PATH=/lib:/usr/lib /bin/sh -c "pip install Pillow==3.2.0"

# --- User Creation ---
RUN addgroup -S -g 9001 otheruser && \
    adduser -S -u 9001 -G otheruser -h /home -s /sbin/nologin otheruser && \
    chown -R otheruser:otheruser /home
RUN apk --no-cache add gdb
ADD https://raw.githubusercontent.com/odoo/odoo/d2f0dba64c7afb3a9de49e7c3d46f74542f8bbbd/openerp/addons/base/static/img/main_partner-image.png /testimage.png
ADD https://gist.githubusercontent.com/blaggacao/aee86f3d8687efefb8186c9ffc19ac9b/raw/04b14f2407294c563e4b646162c754f0bc8fb00d/run.py /run.py
RUN echo "#!/bin/sh" >> /running && \
    echo "gdb -ex r --args python -v run.py" >> /running
RUN chmod +x /running

CMD ["/running"]

With this for running: https://gist.githubusercontent.com/blaggacao/aee86f3d8687efefb8186c9ffc19ac9b/raw/04b14f2407294c563e4b646162c754f0bc8fb00d/run.py

I'll sleep over this... If you are interested in this error, maybe you can quickly build /run this dockerfile to see if you can find something out of this...

HEadache, something broke this test case again :smile_cat:

wiredfool commented 8 years ago

I'm not sure what's up with that test script, especially the round trip through base64.

But, both of these test scripts work, without crashing:

from PIL import Image
im = Image.open('testimage.png')
im.load()

and:

import cStringIO as StringIO
from PIL import Image
with open('/testimage.png', 'rb') as f:
    data = f.read()
image_stream = StringIO.StringIO(data)
image = Image.open(image_stream)
image.load()

The load() at the end is important, since pillow does lazy loading of the image -- open only hits the python layer and gets the image metadata. If you don't actually access the image pixels, then it never loads them and you never hit the c layer for decompression.

So based on this, I'd say that either you're doing something else with the image that's causing issues, or it's failing in an entirely different place.

blaggacao commented 8 years ago

The roundtrip was because I tried to mimic the original code. Well, thanks a lot for your help! Indeed, it also works on my docker for windows beta. So at least I can exclude this possible entropy as a cause for now...

wiredfool commented 8 years ago

The base64 should be a noop, and if it's not, then that's going to be a completely separate problem.

In general though, Pillow shouldn't segfault, and I'll try to track them down given some sort of reproducible test case.

blaggacao commented 8 years ago

True. OK, thanks again. I'll keep on digging...

blaggacao commented 8 years ago

Finally, with valgrind, I was able to get at least some error message: https://gist.github.com/blaggacao/1df4f9dffb0e515aaf6decc6e67ff2d4#file-valgrind-log-L2354

It seems that like something "deep inside python", based on the logs, can there be anything said from a Pillow point of view, such as "definitely not related to pillow", "related but not responsible" or something alike?

Effectively it seems to be triggered by PaX flags and probably by the simple image loading, as this passed without problem in the test. I still couldn't figure out how to get a useful stack trace in order to see which code is calling but on a best guess basis, the program implements also image manipulation at some point https://github.com/odoo/odoo/blob/9.0/openerp/tools/image.py#L10

wiredfool commented 8 years ago

The trace and activity from valgrind make it look like its a gc pass that's segfaulting. Its possible that it's related to the pax flags, but the underlying cause of that would be a mismatch in incref/decref in an extension, so that there is a use after freeing.

The other thing that I noticed is that the error is at least 3 seconds after the logging about pngs, so it's unlikely that that is the direct cause. It would fit with a gc issue, but that just means tha the crash is far removed from the source of the problem.

blaggacao commented 8 years ago

:dizzy: I think this is closable, thank you for you help! I'm trying to get help from the alpine linux mainainers now...

blaggacao commented 8 years ago

This get's ever stranger, on the test docker we made, the first images passes and a second one also. NOW, in the actual code the second image fails, effectively near the Pillow interaction, I'll try to narrow this down further and eventually reopen...

blaggacao commented 8 years ago

I don't know exactly what I did, probably something out of (or a combination of):

EDIT I double checked and definitely it's the privileged flag which made the interactive gdb session working. Seems that gdb needs some kind of kernel capabilities. Not that far a catch... The ulimit -c unlimited in conjunction with the core_patten then would be responsible for generating the dump file, which is not created when the process is properly caught by gdb. END EDIT

But, I now got a backtrace for the situation, maybe you can have a look at it? https://gist.github.com/blaggacao/c081658dea0e7b95cb006a1b1673786f

Without knowing about coredump analysis at all, I would guess, the last item is the culprit (musl), but yeah, things are never that easy :smile:

wiredfool commented 8 years ago

Do you have an actual core file? Or a docker file that can replicate this?

The backtrace looks like pure python, it's a bunch of frames and PyObject_Call. The bit about musl looks like the entry point for python, so I think that's a red herring.

(aside, I'm not seeing how to get symbols for the apks, nor are breakpoints working, nor the python tweaks to gdb to let me see the python objects. Alpine seems to be something of a step backwards in terms of debugability)

blaggacao commented 8 years ago

I can managed to dump a core file off the Docker in VM, https://filebin.net/9svhknfet7ofut0j I can draft a dockerfile, which should be able to replicate, however it would be our full blown deployment stack. Thanks for your help!

blaggacao commented 8 years ago

I found a list of differences between muslibc and glibc, might be of interest, yet maybe not directly related: http://wiki.musl-libc.org/wiki/Functional_differences_from_glibc#Name_Resolver_.2F_DNS

wiredfool commented 8 years ago

I wasn't able to get much more out of the core dump, since I can't find any symbols packages for the python from the .apk. I'm pretty sure that symbols are required for the py-bt command.

I'm not sure what to tell you at this point, other than to try to narrow down what causes it. That may require switching to a python build that has symbols, or running under strace (which is blocked by something) or really verbose logging to narrow down the trigger.

blaggacao commented 8 years ago

OK, I'm really grateful for your help as I have never ever touched these topics and don't have that much real options due to lacks in knowledge. I've tried to build a python with symbols, yet objdump -t tells me "no symbols found" so I'm confused again...

I've tried it this way, I'll try to check back with the alpine community.

# # --- Build Debug Liraries for GDB Debugging ---
# ENV CFLAGS='-Wall -O0 -g'
# RUN apk add --update alpine-sdk \
#   && adduser -G abuild -g "Alpine Package Builder" -s /bin/ash -D builder \
#   && echo "builder ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers \
#   && mkdir /packages \
#   && chown builder:abuild /packages \
#   && git clone --depth 1 git://dev.alpinelinux.org/aports \
#   && abuild-keygen -a -i \
#   && cd aports/main/python
# USER builder
# RUN cp -r aports/main/python /packages/python \
#   && cd /packages/python \
#   && abuild-keygen -a -i \
#   && abuild checksum \
#   && abuild -r \
#   && abuild -i python \
#   && abuild -i python-dev
# USER root
wiredfool commented 8 years ago

The symbols might be in a central location like /usr/lib/debug or /usr/share/gdb.

blaggacao commented 8 years ago

I think, I'm lucky: http://pkgs.alpinelinux.org/package/edge/main/x86_64/python-dbg Will make another attempt with this.

blaggacao commented 8 years ago

Well steps are pretty small ones, I'm constantly running into this, google doesn't say a useful word about this...

Starting program: /usr/lib/debug/usr/bin/python2.7.debug
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
Warning:
Cannot insert breakpoint -2.
Cannot access memory at address 0x65d
blaggacao commented 8 years ago

Here we can see how the same program is acting on ubuntu:

2016-06-06 06:09:45,065 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM pHYs 59 9     
2016-06-06 06:09:45,066 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM tIME 80 7     
2016-06-06 06:09:45,066 8 DEBUG testubuntu PIL.PngImagePlugin: tIME 80 7 (unknown)  
2016-06-06 06:09:45,066 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM IDAT 99 7809  
2016-06-06 06:09:45,343 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM IHDR 16 13    
2016-06-06 06:09:45,344 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM IDAT 41 3136  
2016-06-06 06:09:45,358 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM IHDR 16 13    
2016-06-06 06:09:45,359 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM IDAT 41 3136  
2016-06-06 06:09:45,488 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM IHDR 16 13    
2016-06-06 06:09:45,489 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM IDAT 41 3136  
2016-06-06 06:09:45,502 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM IHDR 16 13    
2016-06-06 06:09:45,503 8 DEBUG testubuntu PIL.PngImagePlugin: STREAM IDAT 41 3136  
2016-06-06 06:09:45,634 8 INFO testubuntu openerp.modules.loading: loading base/res/

It has exactly the same DEBUG messages, so it is happening AFTER the image is loaded... (You probably knew that already :D)

yajo commented 7 years ago

Have you tried with something like https://hub.docker.com/r/frolvlad/alpine-glibc/?

cc @pedrobaeza

lasley commented 7 years ago

I just tested on alpine-glibc & it works fine. As such - I do not believe that this falls within scope of Alpine maintainers, and does belong here. This is most certainly an issue while compiling Pillow against muslc.

pedrobaeza commented 7 years ago

So what do you suggest?

lasley commented 7 years ago

I'm going to play around with some compile options today & see if I can maybe circumvent this. I do feel that this issue should be reopened though, or we should create a new one here.

lasley commented 7 years ago

As an update, (obviously) no luck on randomly tweaking compile options. The next step is to create an isolated test case outside of Odoo that can replicate this issue.

@blaggacao - Where exactly did you see this issue? I'm trying to find a common denominator between the image load failing in my section of code with your's, other than Odoo itself & the fact that the fail is on Pillow.

wiredfool commented 7 years ago

@lasley What code are you seeing fail? FWIW -- our test suite is now (as of this week) running against alpine/musl, and it's succeeding. e.g.: https://travis-ci.org/python-pillow/Pillow/builds/196550625

The docker image is pythonpillow/alpine, and it's defined here: https://github.com/python-pillow/docker-images/tree/master/alpine

lasley commented 7 years ago

@wiredfool - I'm working on pulling that out right now. All of us that are experiencing this issue are deep in the core of an ERP, so isolating the case is like pulling a needle from the haystack. So far the haystack is narrowed to Pillow calls, but we still could be dealing with something lower in the lib stack.

In my case, the segfault is occurring during dynamic creation of a few images that look to be in PNG format. The code is dense though, so I'm trying to find more examples in more dense code to maybe find the common point.

Once I am able to do that, I'll get us a fail case in the existing tests. Am I correctly understanding that the Docker images are just running the Tests folder in Pillow?

wiredfool commented 7 years ago

Yep, the docker images are running the test suite.

blaggacao commented 7 years ago

@lasley From https://github.com/python-pillow/Pillow/issues/1935#issuecomment-222529469 on, things got (a bit) more enlightened. I'd be happy to get back to this topic with some help as back then I couldn't crack it... My main impediment was that I could not get gdb to work on alpine, because back then there were no debugging symbols, but already back then the next beta version showed to have debugging symbols, but we were far away from switching, so I dropped. It might be worthwhile having a thorough look at the above analysis path...

lasley commented 7 years ago

@blaggacao - can you point me to the code that was causing this issue for you? I'm trying to narrow down variables.

You are correct that debug works in Alpine now, but IMO the first step is an isolated test case that consistently fails.

blaggacao commented 7 years ago

It was (!) at that time happening for those images https://github.com/python-pillow/Pillow/issues/1935#issuecomment-222448769 in the base modul, and here is the line that indicates this from the debug logs above: INFO test openerp.modules.loading: loading base/base_data.xml

lasley commented 7 years ago

So your install was segfaulting immediately upon db creation with demo data?

blaggacao commented 7 years ago

Correct.

yajo commented 7 years ago

In my case, when I debugged step by step, I could not reproduce it!

lasley commented 7 years ago

Yeah I'm having similar difficulties. It seems that some gears have shifted somewhere low too, because we both were able to get past the DB creation that @blaggacao was not able to. We're both working on Alpine 3.5, and @blaggacao was in 3.4.

wiredfool commented 7 years ago

So looking back at the valgrind trace I'm seeing that there are at least two other shared libraries that are being called - psycopg and lxml. I wonder if they're possibly the culprit, either alone or in combination with pillow loading images?

If you've actually got a reproducible case, it may be time to dump a bunch of logging into the image class, either by wrapping it in a facade object or modifying it directly.

yajo commented 7 years ago

The case that I can reproduce is not very straight-forward now that I moved my image to Debian, but I will summarize to let it be clear:

  1. Download alpine tag of the repo. It's how it was before Debian.
  2. docker build --tag tecnativa/odoo-base:10.0 .
  3. Clone the scaffolding.
  4. docker-compose -f setup-devel.yaml up
  5. chown -R $USER:$USER . (I'll have to work on getting rid of this, for now you'll need it).
  6. docker-compose -f devel.yaml up
  7. Log in http://localhost:10069
  8. Create a new database.
  9. Install account_accountant addon on it.
  10. Once installed, click on the "Accounting" menu.

Then you'll see on the terminal that Odoo died with exit code 139.

I narrowed it down to know that removing this line makes it not die:

<span t-field="object.country_id.image" t-options='{"widget": "image", "class": "country_flag"}'/>

This is why I suspected the culprit was Pillow (all other database/XML operations worked just fine).

You can debug by adding import wdb;wdb.set_trace() to any Python file or <t t-debug="wdb"/> to any Qweb (XML) file and visiting http://localhost:1984. Keep in mind that if you debug XML files, you'll hit https://github.com/Kozea/wdb/issues/92, which makes it even funnier 🤣

Sorry, this is such a hard to reproduce case, but that's all I got 😞

wiredfool commented 7 years ago

Can you explain step 8?

lasley commented 7 years ago

This is kind of where it gets deep into the ERP. Once you go to the page in step 7, there's a "Create Database" button. For step 9, there's an "Apps" section that installs the culprit.

My replication steps are the same as @Yajo's, and I too am having trouble isolating how to bring the fail case out of the monolith.

I also got it working by removing the same line, so I dug a bit further into the code & I'm almost positive that the piece of code actually causing this issue is odoo.tools.image_resize_image.

Naturally I cannot replicate by simply running the code though, which is both intriguing and maddening. There's a lot of conditionals though, and I definitely haven't tested all the edges yet.

wiredfool commented 7 years ago

So, this appears to be the (or at least, a) request that's killing it OMM, as saved to a curl request

curl 'http://localhost:10069/web/dataset/call_kw/web.planner/render' -H 'Accept: application/json, text/javascript, */*; q=0.01' -H 'Referer: http://localhost:10069/web' -H 'Origin: http://localhost:10069' -H 'X-Requested-With: XMLHttpRequest' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.51 Safari/537.36' -H 'Content-Type: application/json' --data-binary '{"jsonrpc":"2.0","method":"call","params":{"model":"web.planner","method":"render","args":[441,"planner_account"],"kwargs":{"context":{"lang":"en_US","tz":false,"uid":1}}},"id":843142426}' --compressed

(I'm doing this by replaying the XHR, but curl works too.) The click on the Accounting menu makes two successful calls before this, version_info and load_needaction, then there's load_views which fails on the initial call, but replaying works, and then this.

After getting the reproduction, I've injected logging into Pillow using this script, which is imported prior to odoo in oodo-bin:

#!/usr/bin/env python

from PIL import Image, ImageFile, ImageEnhance, ImageDraw, JpegImagePlugin
from functools import wraps

import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

def logger_wrap(f, fname):
    @wraps(f)
    def wrapper(*args, **kwargs):
        logger.debug ("Enter %s" %fname)
        try:
            return f(*args, **kwargs)
        finally:
            logger.debug ("Exit %s" %fname)
    return wrapper

function_type = type(logger_wrap)
class_type = type(Image.Image)

def decorate(obj):
    for name in dir(obj):
        if '__' in name: continue
        #print obj, name
        elt = getattr(obj, name) 
        if type(elt) == class_type:
            decorate(elt)            
        if type(elt) == function_type:
            setattr(obj, name, logger_wrap(elt, "%s:%s" %(obj.__name__, name)))

for module in (Image, ImageFile, ImageEnhance, JpegImagePlugin):
    decorate(module)

oodo-bin:

#!/usr/bin/env python

# set server timezone in UTC before time module imported
__import__('os').environ['TZ'] = 'UTC'
__import__('pkg_resources').declare_namespace('odoo.addons')

import inject_logger
import odoo

if __name__ == "__main__":
    odoo.cli.main()

The logging that I'm getting from that replay is this:

$ docker-compose -f devel.yaml up
Starting scaffolding_db_1
Starting scaffolding_wdb_1
Starting scaffolding_inbox_1
Starting scaffolding_smtp_1
Starting scaffolding_odoo_1
Attaching to scaffolding_wdb_1, scaffolding_db_1, scaffolding_smtp_1, scaffolding_inbox_1, scaffolding_odoo_1
db_1     | LOG:  database system was shut down at 2017-02-02 18:56:44 UTC
db_1     | LOG:  MultiXact member wraparound protections are now enabled
db_1     | LOG:  autovacuum launcher started
db_1     | LOG:  database system is ready to accept connections
smtp_1   | smtp-sink: name_mask: all
smtp_1   | smtp-sink: trying... [0.0.0.0]:25
inbox_1  | <7>haproxy-systemd-wrapper: executing /usr/local/sbin/haproxy -p /run/haproxy.pid -f /usr/local/etc/haproxy/haproxy.cfg -Ds 
inbox_1  | [ALERT] 032/185707 (8) : parsing [/usr/local/etc/haproxy/haproxy.cfg:8] : 'server stupid' : could not resolve address 'inbox.example.com'.
inbox_1  | [ALERT] 032/185707 (8) : Failed to initialize server(s) addr.
inbox_1  | <5>haproxy-systemd-wrapper: exit, haproxy RC=1
odoo_1   | NFO 18:57:1486061828 Executing contents of /opt/odoo/custom/entrypoint.d
odoo_1   | INFO 18:57:1486061828 Executing contents of /opt/odoo/common/entrypoint.d
odoo_1   | INFO 18:57:1486061828 Waiting until postgres is listening at db...
odoo_1   | + exec odoo --dev all
scaffolding_inbox_1 exited with code 1
odoo_1   | 2017-02-02 18:57:09,401 1 INFO ? odoo: Odoo version 10.0
odoo_1   | 2017-02-02 18:57:09,401 1 INFO ? odoo: Using configuration file at /opt/odoo/auto/odoo.conf
odoo_1   | 2017-02-02 18:57:09,401 1 INFO ? odoo: addons paths: ['/var/lib/odoo/addons/10.0', u'/opt/odoo/custom/src/private', u'/opt/odoo/auto/addons', u'/opt/odoo/custom/src/odoo/addons', '/opt/odoo/custom/src/odoo/odoo/addons']
odoo_1   | 2017-02-02 18:57:09,402 1 INFO ? odoo: database: odoo@db:default
odoo_1   | 2017-02-02 18:57:09,435 1 WARNING ? odoo.service.server: 'watchdog' module not installed. Code autoreload feature is disabled
odoo_1   | 2017-02-02 18:57:09,436 1 INFO ? odoo.service.server: HTTP service (werkzeug) running on 0.0.0.0:8069
odoo_1   | 2017-02-02 18:57:09,451 1 INFO devel odoo.modules.loading: loading 1 modules...
odoo_1   | 2017-02-02 18:57:09,718 1 INFO devel odoo.modules.loading: 1 modules loaded in 0.27s, 0 queries
odoo_1   | 2017-02-02 18:57:09,732 1 INFO devel odoo.modules.loading: loading 12 modules...
odoo_1   | 2017-02-02 18:57:09,789 1 INFO devel odoo.modules.loading: 12 modules loaded in 0.06s, 0 queries
odoo_1   | 2017-02-02 18:57:09,929 1 INFO devel odoo.modules.loading: Modules loaded.
odoo_1   | 2017-02-02 18:57:21,110 1 INFO ? odoo.addons.bus.models.bus: Bus.loop listen imbus on db postgres
odoo_1   | 2017-02-02 18:57:21,717 1 DEBUG ? inject_logger: Enter PIL.Image:new
odoo_1   | 2017-02-02 18:57:21,717 1 DEBUG ? inject_logger: Enter PIL.Image:_check_size
odoo_1   | 2017-02-02 18:57:21,717 1 DEBUG ? inject_logger: Exit PIL.Image:_check_size
odoo_1   | 2017-02-02 18:57:21,717 1 DEBUG ? inject_logger: Enter PIL.Image:isStringType
odoo_1   | 2017-02-02 18:57:21,717 1 DEBUG ? inject_logger: Exit PIL.Image:isStringType
odoo_1   | 2017-02-02 18:57:21,718 1 DEBUG ? inject_logger: Exit PIL.Image:new
odoo_1   | 2017-02-02 18:57:21,718 1 DEBUG ? inject_logger: Enter PIL.Image:_getdecoder
odoo_1   | 2017-02-02 18:57:21,718 1 DEBUG ? inject_logger: Exit PIL.Image:_getdecoder
odoo_1   | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Enter PIL.Image:register_open
odoo_1   | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Exit PIL.Image:register_open
odoo_1   | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Enter PIL.Image:register_save
odoo_1   | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Exit PIL.Image:register_save
odoo_1   | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Enter PIL.Image:register_extension
odoo_1   | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Exit PIL.Image:register_extension
odoo_1   | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Enter PIL.Image:register_mime
odoo_1   | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Exit PIL.Image:register_mime
odoo_1   | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Enter PIL.Image:isPath
odoo_1   | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Exit PIL.Image:isPath
odoo_1   | 2017-02-02 18:57:21,719 1 DEBUG ? inject_logger: Enter PIL.Image:preinit
odoo_1   | 2017-02-02 18:57:21,720 1 DEBUG ? inject_logger: Enter PIL.Image:register_open
odoo_1   | 2017-02-02 18:57:21,720 1 DEBUG ? inject_logger: Exit PIL.Image:register_open
odoo_1   | 2017-02-02 18:57:21,720 1 DEBUG ? inject_logger: Enter PIL.Image:register_save
odoo_1   | 2017-02-02 18:57:21,720 1 DEBUG ? inject_logger: Exit PIL.Image:register_save
odoo_1   | 2017-02-02 18:57:21,720 1 DEBUG ? inject_logger: Enter PIL.Image:register_extension
odoo_1   | 2017-02-02 18:57:21,720 1 DEBUG ? inject_logger: Exit PIL.Image:register_extension
odoo_1   | 2017-02-02 18:57:21,720 1 DEBUG ? inject_logger: Enter PIL.Image:register_mime
odoo_1   | 2017-02-02 18:57:21,720 1 DEBUG ? inject_logger: Exit PIL.Image:register_mime
odoo_1   | 2017-02-02 18:57:21,721 1 DEBUG ? inject_logger: Enter PIL.Image:register_open
odoo_1   | 2017-02-02 18:57:21,721 1 DEBUG ? inject_logger: Exit PIL.Image:register_open
odoo_1   | 2017-02-02 18:57:21,721 1 DEBUG ? inject_logger: Enter PIL.Image:register_save
odoo_1   | 2017-02-02 18:57:21,722 1 DEBUG ? inject_logger: Exit PIL.Image:register_save
odoo_1   | 2017-02-02 18:57:21,722 1 DEBUG ? inject_logger: Enter PIL.Image:register_save_all
odoo_1   | 2017-02-02 18:57:21,722 1 DEBUG ? inject_logger: Exit PIL.Image:register_save_all
odoo_1   | 2017-02-02 18:57:21,722 1 DEBUG ? inject_logger: Enter PIL.Image:register_extension
odoo_1   | 2017-02-02 18:57:21,722 1 DEBUG ? inject_logger: Exit PIL.Image:register_extension
odoo_1   | 2017-02-02 18:57:21,722 1 DEBUG ? inject_logger: Enter PIL.Image:register_mime
odoo_1   | 2017-02-02 18:57:21,722 1 DEBUG ? inject_logger: Exit PIL.Image:register_mime
odoo_1   | 2017-02-02 18:57:21,722 1 DEBUG ? inject_logger: Enter PIL.Image:register_open
odoo_1   | 2017-02-02 18:57:21,722 1 DEBUG ? inject_logger: Exit PIL.Image:register_open
odoo_1   | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Enter PIL.Image:register_save
odoo_1   | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Exit PIL.Image:register_save
odoo_1   | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Enter PIL.Image:register_extension
odoo_1   | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Exit PIL.Image:register_extension
odoo_1   | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Enter PIL.Image:register_extension
odoo_1   | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Exit PIL.Image:register_extension
odoo_1   | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Enter PIL.Image:register_extension
odoo_1   | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Exit PIL.Image:register_extension
odoo_1   | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Exit PIL.Image:preinit
odoo_1   | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Enter PIL.ImageFile:_save
odoo_1   | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Enter PIL.ImageFile:_tilesort
odoo_1   | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Exit PIL.ImageFile:_tilesort
odoo_1   | 2017-02-02 18:57:21,723 1 DEBUG ? inject_logger: Enter PIL.Image:_getencoder
odoo_1   | 2017-02-02 18:57:21,724 1 DEBUG ? inject_logger: Exit PIL.Image:_getencoder
odoo_1   | 2017-02-02 18:57:21,725 1 DEBUG ? inject_logger: Exit PIL.ImageFile:_save
odoo_1   | 2017-02-02 18:57:21,728 1 INFO ? odoo.addons.report.models.report: Will use the Wkhtmltopdf binary at /usr/local/bin/wkhtmltopdf
odoo_1   | 2017-02-02 18:57:22,137 1 INFO ? odoo.http: HTTP Configuring static files
odoo_1   | 2017-02-02 18:57:22,161 1 INFO test odoo.modules.loading: loading 1 modules...
odoo_1   | 2017-02-02 18:57:22,186 1 INFO test odoo.modules.loading: 1 modules loaded in 0.03s, 0 queries
odoo_1   | 2017-02-02 18:57:22,211 1 INFO test odoo.modules.loading: loading 28 modules...
odoo_1   | 2017-02-02 18:57:22,290 1 INFO test odoo.modules.loading: 28 modules loaded in 0.08s, 0 queries
odoo_1   | 2017-02-02 18:57:22,598 1 INFO test odoo.modules.loading: Modules loaded.
odoo_1   | 2017-02-02 18:57:22,606 1 INFO test odoo.addons.base.ir.ir_http: Generating routing map
db_1     | LOG:  unexpected EOF on client connection with an open transaction

So, it looks like it's somewhere else, in odoo.addons.base.ir.ir_http, or therabouts. It did appear to be deterministic, failing there three times on docker-compose runs. The fourth time the call succeeded, returned data, then died prior to any additional output when I replayed the XHR again. So, mostly reproducable, but not completely.