ledatelescope / bifrost

A stream processing framework for high-throughput applications.
BSD 3-Clause "New" or "Revised" License
66 stars 29 forks source link

Segfault in Python 3 #147

Closed telegraphic closed 2 years ago

telegraphic commented 3 years ago

Not sure if this is two separate issues, or one inter-related.

Part 1

I was trying to get bifrost working in Python 3.7, and ran into PEP479, which changes how StopIteration is handled in a generator: https://www.python.org/dev/peps/pep-0479/

Basically StopIteration now gets converted into a RuntimeError. My workaround was instead to raise a new exception, EndOfDataStop, and catch that in any code that uses yield or explicitly catches StopIteration. This seems to work, athough not sure it is the optimal approach.

Part 2

After fixing the StopIteration issue, I'm now running into an issue where the code crashes upon pipeline shutdown, with an error:

*** glibc detected *** python: double free or corruption (fasttop): 0x00007f5600006460 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x3d1ce75dee)[0x7fd5d35a7dee]
/lib64/libc.so.6(+0x3d1ce78c3d)[0x7fd5d35aac3d]
/home/dprice/install/bifrost/lib/libbifrost.so.0(+0x5b84d)[0x7fd5ca1f684d]
/home/dprice/install/bifrost/lib/libbifrost.so.0(+0x5376c)[0x7fd5ca1ee76c]
/home/dprice/install/bifrost/lib/libbifrost.so.0(bfRingDestroy+0x13)[0x7fd5ca1dea33]
/home/dprice/mpy36/lib/python3.6/lib-dynload/../../libffi.so.7(+0x69dd)[0x7fd5d3cf59dd]
/home/dprice/mpy36/lib/python3.6/lib-dynload/../../libffi.so.7(+0x6067)[0x7fd5d3cf5067]
/home/dprice/mpy36/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce)[0x7fd5d3b2eede]
/home/dprice/mpy36/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x13915)[0x7fd5d3b2f915]
python(_PyObject_FastCallDict+0x8b)[0x7fd5d3e18bfb]
python(+0x19abae)[0x7fd5d3ea0bae]
python(_PyEval_EvalFrameDefault+0x30a)[0x7fd5d3ec325a]
python(+0x194c1b)[0x7fd5d3e9ac1b]
python(+0x19ab35)[0x7fd5d3ea0b35]
python(_PyEval_EvalFrameDefault+0x30a)[0x7fd5d3ec325a]
python(+0x194c1b)[0x7fd5d3e9ac1b]
python(+0x19ab35)[0x7fd5d3ea0b35]
python(_PyEval_EvalFrameDefault+0x30a)[0x7fd5d3ec325a]
python(_PyFunction_FastCallDict+0x11b)[0x7fd5d3e9b28b]
python(_PyObject_FastCallDict+0x26f)[0x7fd5d3e18ddf]
python(_PyObject_Call_Prepend+0x63)[0x7fd5d3e1d873]
python(_PyObject_FastCallDict+0x8b)[0x7fd5d3e18bfb]
python(+0x16d922)[0x7fd5d3e73922]
python(+0x1691ce)[0x7fd5d3e6f1ce]
python(_PyGC_CollectNoFail+0x2a)[0x7fd5d3ef707a]
python(PyImport_Cleanup+0x39e)[0x7fd5d3eac04e]
python(Py_FinalizeEx+0x61)[0x7fd5d3f16061]
python(Py_Main+0x35e)[0x7fd5d3f203ae]
python(main+0xee)[0x7fd5d3dea43e]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7fd5d3550d1d]
python(+0x1c3d0b)[0x7fd5d3ec9d0b]

As far as I can tell, this error occurs when two bf.views are used one after each other. For example, this runs fine:

b_cpu = bf.blocks.read_dada_file([args.filename,], hdr_callback, gulp_nframe=1, core=0)
b_cpu = bf.views.merge_axes(b_cpu, 'station', 'pol', label='station')
bf.get_default_pipeline().run()

But this doesn't:

b_cpu = bf.blocks.read_dada_file([args.filename,], hdr_callback, gulp_nframe=1, core=0)
b_cpu = bf.views.reinterpret_axis(b_dada, 'freq',    label='freq', units='')
b_cpu = bf.views.merge_axes(b_cpu, 'station', 'pol', label='station')
bf.get_default_pipeline().run()

Any ideas? This fails on both Python 3.7 AND Python 3.6. I reverted my StopIteration changes back and Py 3.6 still fails.

mlaures commented 3 years ago

I just started using Bifrost in 3.7 and ran into the same issue with the yield. Instead of changing the read functions in the code, I used the other functions to provide the same functionality in my own code.

I used ReadSequence.increment() to change sequences in a function and ReadSequence.acquire() to get the span. Both are within a try-except block. The ReadSequence does not need to be destroyed until the end of the ring is reached, with ReadSequence.close(). The ReadSpans returned by ReadSequence.acquire() need to be released at every span with ReadSpan.release()

I haven't yet played around with views, so I don't have observations on that issue.

jaycedowell commented 3 years ago

@telegraphic Do you have a branch that I can look at for how you implemented EndOfDataStop?

telegraphic commented 3 years ago

It was very basic: https://github.com/telegraphic/bifrost/commit/0a354fd95e81ab2539c907b7d6911aed9a4715ca

jaycedowell commented 3 years ago

I haven't been able to reproduce this with the tests that are run through Travis or Jenkins. @telegraphic Do the two code snippets in your original post trigger the original StopIteration problem? If so, can you point me to a dada file to try?

jaycedowell commented 3 years ago

I was able to find an old LEDA64-NM dada file (from 2014!) to test on. My next stumbling block is a function for hdr_callback.

telegraphic commented 3 years ago

Hey Jayce, let me put together a failing test and input data and get back to you on this one. The hdr_callback is a bit of effort - it's needed to populate the bifrost _tensor from the DADA header (the dada header format is not standardised enough to do this without a helping hand).

jaycedowell commented 2 years ago

Closing with the release of v0.10.0.