rtyler / py-yajl

py-yajl provides Python bindings for the Yajl JSON encoder/decoder library
http://rtyler.github.com/py-yajl
74 stars 18 forks source link

0.3.3 segfaults after 150 invocations of dumps(tuple(data)) #11

Closed indigoviolet closed 14 years ago

indigoviolet commented 14 years ago

Any idea why this might happen?

    > cat test.py                                   
    #!/usr/bin/env python                           

    import yajl                                     
    import sys                                      

    for i, l in enumerate(sys.stdin):               
        l = l.rstrip('\n').split('\t')              
        print i                                     
        d = yajl.dumps(tuple(l))                    

Doesn't seem to depend on the input data.

Removing the tuple() call makes it work just fine; 0.3.1 also works fine (except it doesn't handle the tuples)

I can give you more details about my machine etc if you need it.

rtyler commented 14 years ago

I actually saw the subject line of this in my inbox on my phone before I got to my computer and I already knew what it was. When I take the elements out of the tuple and place them into a list the PyTuple_GetItem call returns a borrowed reference, so I'm not properly incrementing the refcount.

rtyler commented 14 years ago

Hmm, do you have a sample file and script to recreate this fault? I'm not able to reproduce it myself.

indigoviolet commented 14 years ago

Sure. Here's a test script: http://dl.dropbox.com/u/772095/test.py and a data file: http://dl.dropbox.com/u/772095/data.gz

I did a fresh git clone, python setup.py build and ran the following from the build/lib.linux-x86_64-2.6 directory:

> zcat data.gz| wc
10000   24208  178555
 
zcat data.gz | ./test.py

Output:

...
445 ["Admirals de Norfolk"]
446 ["Adolf Busch"]
447 ["Adolfo Bioy Casares"]
zsh: broken pipe         zcat data.gz | `
zsh: segmentation fault  ./test.py
rtyler commented 14 years ago

Is this data "free" enough to place in the test_data directory to maintain stability after I get the fix in?

indigoviolet commented 14 years ago

This is just a list of titles from a Wikipedia dump, which is under Creative Commons and should be fine to use for tests. If you're still concerned, I can generate some random text, I believe that will have the same behavior.

indigoviolet commented 14 years ago

Here's some random data http://dl.dropbox.com/u/772095/random.data.gz that has the same behavior:

$>zcat random.data.gz | ./test.py

444 ["UMKZQHGMWRLNUEOADWLMURBTMXWOVZWW"]
445 ["MVQIYOTRNKIWHQTANSW"]
446 ["HDNHQAZYAOJ"]
447 ["FADZVGVVYDHWIGBIBRG"]
zsh: broken pipe         zcat random.data.gz | 
zsh: segmentation fault  ./test.py
rtyler commented 14 years ago

Should be resolved with SHA: 836ea060ac8075ffdc6754c3ef9d643db45ed36b