ledatelescope / bifrost

A stream processing framework for high-throughput applications.
BSD 3-Clause "New" or "Revised" License
66 stars 29 forks source link

Crash in pipeline with 2x branches #86

Closed telegraphic closed 7 years ago

telegraphic commented 7 years ago

I'm getting a crash when I try and run a pipeline with two FFTs with different windows:

https://github.com/telegraphic/bunyip/blob/master/mb_multi_fft_test.py

This file will generate test data for you to play with too.

The crash:

map.cpp:335 Condition failed: broadcast_shapes(narg, args, mutable_shape, &ndim)
map.cpp-----:
335 error 7: BF_STATUS_INVALID_SHAPE
File "./mb_multi_fft_test.py", line 179, in <module>
    b_lowres    = blocks.accumulate(b_lowres, n_int_lowres)
  File "build/bdist.linux-x86_64/egg/bifrost/blocks/accumulate.py", line 103, in accumulate
    return AccumulateBlock(iring, nframe, dtype, *args, **kwargs)
  File "build/bdist.linux-x86_64/egg/bifrost/blocks/accumulate.py", line 46, in __init__
    *args, **kwargs)
  File "build/bdist.linux-x86_64/egg/bifrost/pipeline.py", line 540, in __init__
    super(TransformBlock, self).__init__([iring], *args, **kwargs)
  File "build/bdist.linux-x86_64/egg/bifrost/pipeline.py", line 422, in __init__
    super(MultiTransformBlock, self).__init__(irings_, *args, **kwargs)
  File "build/bdist.linux-x86_64/egg/bifrost/pipeline.py", line 289, in __init__
    self.init_trace = ''.join(traceback.format_stack())

...

Exception in thread Pipeline_0/BlockScope_16/AccumulateBlock_1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "build/bdist.linux-x86_64/egg/bifrost/pipeline.py", line 308, in run
    self.main(active_orings)
  File "build/bdist.linux-x86_64/egg/bifrost/pipeline.py", line 498, in main
    ostrides = self._on_data(ispans, ospans)
  File "build/bdist.linux-x86_64/egg/bifrost/pipeline.py", line 572, in _on_data
    nframe_commit = self.on_data(ispans[0], ospans[0])
  File "build/bdist.linux-x86_64/egg/bifrost/blocks/accumulate.py", line 68, in on_data
    bf.map("b = beta * b + (b_type)a", a=idata, b=odata, beta=beta)
  File "build/bdist.linux-x86_64/egg/bifrost/map.py", line 110, in map
    func_string))
  File "build/bdist.linux-x86_64/egg/bifrost/libbifrost.py", line 154, in _check
    raise RuntimeError(status_str)
RuntimeError: BF_STATUS_INVALID_SHAPE

From block instantiated here:
  File "./mb_multi_fft_test.py", line 170, in <module>
    b_lowres      = blocks.copy(b_lowres, space='system', gulp_nframe=n_gulp_lowres)
  File "build/bdist.linux-x86_64/egg/bifrost/blocks/copy.py", line 71, in copy
    return CopyBlock(iring, space, *args, **kwargs)
  File "build/bdist.linux-x86_64/egg/bifrost/blocks/copy.py", line 38, in __init__
    super(CopyBlock, self).__init__(iring, *args, **kwargs)
  File "build/bdist.linux-x86_64/egg/bifrost/pipeline.py", line 540, in __init__
    super(TransformBlock, self).__init__([iring], *args, **kwargs)
  File "build/bdist.linux-x86_64/egg/bifrost/pipeline.py", line 422, in __init__
    super(MultiTransformBlock, self).__init__(irings_, *args, **kwargs)
  File "build/bdist.linux-x86_64/egg/bifrost/pipeline.py", line 289, in __init__
    self.init_trace = ''.join(traceback.format_stack())

Exception in thread Pipeline_0/BlockScope_12/CopyBlock_3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "build/bdist.linux-x86_64/egg/bifrost/pipeline.py", line 308, in run
    self.main(active_orings)
  File "build/bdist.linux-x86_64/egg/bifrost/pipeline.py", line 491, in main
    ospans = self.reserve_spans(ospan_stack, oseqs, ispans)
  File "build/bdist.linux-x86_64/egg/bifrost/pipeline.py", line 330, in reserve_spans
    igulp_nframes = [span.nframe for span in ispans]
  File "build/bdist.linux-x86_64/egg/bifrost/ring2.py", line 356, in nframe
    assert(size_bytes % self.tensor['frame_nbyte'] == 0)
AssertionError

What I am trying to do here is to test out a gpuspec pipeline for Parkes. Each compute node will receive 12.5 MHz bandwidth from 13x beams, and we want to make three products, e.g.

My approach so far is to fiddle with the data before copying it over to the GPU, to make the fft_window length match one of the axes. This script isn't doing the Hz resolution as that one is the hardest to fit into GPU RAM.

benbarsdell commented 7 years ago

Does this happen every time, or only sometimes? It looks a bit like what 4b15120cd1bfff5c638e5c6238876e7f04cc42fc fixed.

telegraphic commented 7 years ago

It was sometimes, and 4b15120cd1bfff5c638e5c6238876e7f04cc42fc did indeed fix it (needed a re-make)!