rwcarlsen / goexif

Decode embedded EXIF meta data from image files.
BSD 2-Clause "Simplified" License
627 stars 134 forks source link

Out of memory errors #20

Closed dpup closed 9 years ago

dpup commented 10 years ago

We're seeing OOMs quite frequently in production, but haven't yet isolated a root cause. Was wondering if you had any thoughts, since it always originates in the exif code:

Images are always < 5MB on disk and JPG or JPEG.

Thanks in advance.

runtime.throw(0xbd8337)
        /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/panic.c:464 +0x69 fp=0x7f9ca16dde88
runtime.SysMap(0xc51fc40000, 0x82700000, 0xbeadd8)
        /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/mem_linux.c:131 +0xfe fp=0x7f9ca16ddeb8
runtime.MHeap_SysAlloc(0xbf4d20, 0x82700000)
        /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/malloc.goc:473 +0x10a fp=0x7f9ca16ddef8
MHeap_Grow(0xbf4d20, 0x82700)
        /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/mheap.c:241 +0x5d fp=0x7f9ca16ddf38
MHeap_AllocLocked(0xbf4d20, 0x826fe, 0x0)
        /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/mheap.c:126 +0x305 fp=0x7f9ca16ddf78
runtime.MHeap_Alloc(0xbf4d20, 0x826fe, 0x100000000, 0x1)
        /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/mheap.c:95 +0x7b fp=0x7f9ca16ddfa0
runtime.mallocgc(0x826fd370, 0x74fd61, 0x0)
        /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/malloc.goc:89 +0x484 fp=0x7f9ca16de010
cnew(0x74fd60, 0x104dfa6e, 0xc200000001)
        /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/malloc.goc:718 +0xc1 fp=0x7f9ca16de030
runtime.cnewarray(0x74fd60, 0x104dfa6e)
        /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/malloc.goc:731 +0x3a fp=0x7f9ca16de050
makeslice1(0x6c3c40, 0xd0b2ebf, 0x104dfa6e, 0x7f9ca16de108)
        /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/slice.c:57 +0x4d fp=0x7f9ca16de068
growslice1(0x6c3c40, 0xc4995a0000, 0xd0b2ebf, 0xd0b2ebf, 0xd0b2ec0, ...)
        /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/slice.c:113 +0x58 fp=0x7f9ca16de098
runtime.growslice(0x6c3c40, 0xc4995a0000, 0xd0b2ebf, 0xd0b2ebf, 0x1, ...)
        /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/slice.c:80 +0x9d fp=0x7f9ca16de0e0
github.com/rwcarlsen/goexif/tiff.Decode(0x7f9cabfe1268, 0xc2108aef30, 0x0, 0x0, 0x0)
        /media/ephemeral0/var-local/posadero/jenkins-workspace/GoMiro_1_Compile/src/github.com/rwcarlsen/goexif/tiff/tiff.go:87 +0x806 fp=0x7f9ca16de210
github.com/rwcarlsen/goexif/exif.Decode(0x7f9cac000a30, 0xc2115a5280, 0x892000, 0x7f9cabff89e8, 0xc21e867d20)
        /media/ephemeral0/var-local/posadero/jenkins-workspace/GoMiro_1_Compile/src/github.com/rwcarlsen/goexif/exif/exif.go:163 +0xb59 fp=0x7f9ca16de410
<snip>
rwcarlsen commented 10 years ago

I actually recently received an email from another user with a similar issue. It seems that some pictures by certain cameras/phones have produced corrupt tiff data structures. They basically have corrupt offsets in their IFD's that the tiff package doesn't handle very safely. I think the other user, bdotdub, has a fix for dealing with this sort of problem that might work for you. You can check out his fork at https://github.com/bdotdub/goexif.

Eventually, when research doesn't have me tied up, I'll get around to making the tiff reading more robust. I want it to be able to safely recognize and return an error for corrupt tiff structures. I'd also like to make the tiff (and exif) packages return a best effort decode even when this sort of error occurs. Pull requests are always welcome :-)

Also, if you could provide a problem-photo or two, that would be helpful. If you'd rather not post them here, you can also email them to me at rwcarlsen at gmail dot com.

On Sun, Feb 23, 2014 at 12:50 AM, Daniel Pupius notifications@github.comwrote:

We're seeing OOMs quite frequently in production, but haven't yet isolated a root cause. Was wondering if you had any thoughts, since it always originates in the exif code:

Images are always < 5MB on disk and JPG or JPEG.

Thanks in advance.

runtime.throw(0xbd8337) /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/panic.c:464 +0x69 fp=0x7f9ca16dde88 runtime.SysMap(0xc51fc40000, 0x82700000, 0xbeadd8) /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/mem_linux.c:131 +0xfe fp=0x7f9ca16ddeb8 runtime.MHeap_SysAlloc(0xbf4d20, 0x82700000) /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/malloc.goc:473 +0x10a fp=0x7f9ca16ddef8 MHeap_Grow(0xbf4d20, 0x82700) /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/mheap.c:241 +0x5d fp=0x7f9ca16ddf38 MHeap_AllocLocked(0xbf4d20, 0x826fe, 0x0) /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/mheap.c:126 +0x305 fp=0x7f9ca16ddf78 runtime.MHeap_Alloc(0xbf4d20, 0x826fe, 0x100000000, 0x1) /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/mheap.c:95 +0x7b fp=0x7f9ca16ddfa0 runtime.mallocgc(0x826fd370, 0x74fd61, 0x0) /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/malloc.goc:89 +0x484 fp=0x7f9ca16de010 cnew(0x74fd60, 0x104dfa6e, 0xc200000001) /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/malloc.goc:718 +0xc1 fp=0x7f9ca16de030 runtime.cnewarray(0x74fd60, 0x104dfa6e) /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/malloc.goc:731 +0x3a fp=0x7f9ca16de050 makeslice1(0x6c3c40, 0xd0b2ebf, 0x104dfa6e, 0x7f9ca16de108) /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/slice.c:57 +0x4d fp=0x7f9ca16de068 growslice1(0x6c3c40, 0xc4995a0000, 0xd0b2ebf, 0xd0b2ebf, 0xd0b2ec0, ...) /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/slice.c:113 +0x58 fp=0x7f9ca16de098 runtime.growslice(0x6c3c40, 0xc4995a0000, 0xd0b2ebf, 0xd0b2ebf, 0x1, ...) /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/slice.c:80 +0x9d fp=0x7f9ca16de0e0github.com/rwcarlsen/goexif/tiff.Decode(0x7f9cabfe1268, 0xc2108aef30, 0x0, 0x0, 0x0) /media/ephemeral0/var-local/posadero/jenkins-workspace/GoMiro_1_Compile/src/github.com/rwcarlsen/goexif/tiff/tiff.go:87 +0x806 fp=0x7f9ca16de210github.com/rwcarlsen/goexif/exif.Decode(0x7f9cac000a30, 0xc2115a5280, 0x892000, 0x7f9cabff89e8, 0xc21e867d20) /media/ephemeral0/var-local/posadero/jenkins-workspace/GoMiro_1_Compile/src/github.com/rwcarlsen/goexif/exif/exif.go:163 +0xb59 fp=0x7f9ca16de410

## Reply to this email directly or view it on GitHubhttps://github.com/rwcarlsen/goexif/issues/20 .
dpup commented 10 years ago

Thanks for the reply.

I've added better logging to see if we can identify the problem image. Lots of concurrent requests and the OOM killing everything instantly meant it wasn't possible to track down.

Will add images when/if I identify them. On Feb 23, 2014 8:36 PM, "Robert Carlsen" notifications@github.com wrote:

I actually recently received an email from another user with a similar issue. It seems that some pictures by certain cameras/phones have produced corrupt tiff data structures. They basically have corrupt offsets in their IFD's that the tiff package doesn't handle very safely. I think the other user, bdotdub, has a fix for dealing with this sort of problem that might work for you. You can check out his fork at https://github.com/bdotdub/goexif.

Eventually, when research doesn't have me tied up, I'll get around to making the tiff reading more robust. I want it to be able to safely recognize and return an error for corrupt tiff structures. I'd also like to make the tiff (and exif) packages return a best effort decode even when this sort of error occurs. Pull requests are always welcome :-)

Also, if you could provide a problem-photo or two, that would be helpful. If you'd rather not post them here, you can also email them to me at rwcarlsen at gmail dot com.

On Sun, Feb 23, 2014 at 12:50 AM, Daniel Pupius notifications@github.comwrote:

We're seeing OOMs quite frequently in production, but haven't yet isolated a root cause. Was wondering if you had any thoughts, since it always originates in the exif code:

Images are always < 5MB on disk and JPG or JPEG.

Thanks in advance.

runtime.throw(0xbd8337) /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/panic.c:464 +0x69 fp=0x7f9ca16dde88 runtime.SysMap(0xc51fc40000, 0x82700000, 0xbeadd8)

/home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/mem_linux.c:131 +0xfe fp=0x7f9ca16ddeb8 runtime.MHeap_SysAlloc(0xbf4d20, 0x82700000)

/home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/malloc.goc:473 +0x10a fp=0x7f9ca16ddef8 MHeap_Grow(0xbf4d20, 0x82700) /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/mheap.c:241 +0x5d fp=0x7f9ca16ddf38 MHeap_AllocLocked(0xbf4d20, 0x826fe, 0x0) /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/mheap.c:126 +0x305 fp=0x7f9ca16ddf78 runtime.MHeap_Alloc(0xbf4d20, 0x826fe, 0x100000000, 0x1) /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/mheap.c:95 +0x7b fp=0x7f9ca16ddfa0 runtime.mallocgc(0x826fd370, 0x74fd61, 0x0)

/home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/malloc.goc:89 +0x484 fp=0x7f9ca16de010 cnew(0x74fd60, 0x104dfa6e, 0xc200000001)

/home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/malloc.goc:718 +0xc1 fp=0x7f9ca16de030 runtime.cnewarray(0x74fd60, 0x104dfa6e)

/home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/malloc.goc:731 +0x3a fp=0x7f9ca16de050 makeslice1(0x6c3c40, 0xd0b2ebf, 0x104dfa6e, 0x7f9ca16de108) /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/slice.c:57 +0x4d fp=0x7f9ca16de068 growslice1(0x6c3c40, 0xc4995a0000, 0xd0b2ebf, 0xd0b2ebf, 0xd0b2ec0, ...) /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/slice.c:113 +0x58 fp=0x7f9ca16de098 runtime.growslice(0x6c3c40, 0xc4995a0000, 0xd0b2ebf, 0xd0b2ebf, 0x1, ...) /home/ec2-user/infra/packages/internal/go/go/src/pkg/runtime/slice.c:80 +0x9d fp= 0x7f9ca16de0e0github.com/rwcarlsen/goexif/tiff.Decode(0x7f9cabfe1268, 0xc2108aef30, 0x0, 0x0, 0x0)

/media/ephemeral0/var-local/posadero/jenkins-workspace/GoMiro_1_Compile/src/ github.com/rwcarlsen/goexif/tiff/tiff.go:87 +0x806 fp= 0x7f9ca16de210github.com/rwcarlsen/goexif/exif.Decode(0x7f9cac000a30, 0xc2115a5280, 0x892000, 0x7f9cabff89e8, 0xc21e867d20)

/media/ephemeral0/var-local/posadero/jenkins-workspace/GoMiro_1_Compile/src/ github.com/rwcarlsen/goexif/exif/exif.go:163 +0xb59 fp=0x7f9ca16de410

## Reply to this email directly or view it on GitHub< https://github.com/rwcarlsen/goexif/issues/20> .

Reply to this email directly or view it on GitHubhttps://github.com/rwcarlsen/goexif/issues/20#issuecomment-35857261 .

dpup commented 10 years ago

I integrated the patch from bdotdub/goexif but that doesn't appear to have helped. The OOMs are coming from the append in tiff.go here:

        // load the dir
        d, offset, err = DecodeDir(buf, t.Order)
        if err != nil {
            return nil, err
        }
        t.Dirs = append(t.Dirs, d)

Still haven't found a correlation between specific images and OOM. Which implies that it is reasonably common and causes an issue when we're close to the red-line already. The attached image appeared to be the culprit of two crashes:

0 yde2mc75hx3itq5u

dpup commented 10 years ago

Scratch that, I think that must be a coincidence because that image triggers EOF reading jpeg_APP1.