lacker / seticore

A high-performance implementation of some core SETI algorithms that can be included in other programs.
MIT License
3 stars 7 forks source link

Hits writer fails when num_coarse_channels > 1 #24

Closed radonnachie closed 1 year ago

radonnachie commented 1 year ago

I get this error whenever the number of coarse-channels in the filterbank-metadata is > 1.

BLADE [INFO]  | [M::DEDOPPLER] num_timesteps: 16
BLADE [INFO]  | [M::DEDOPPLER] num_channels: 262144
BLADE [INFO]  | [M::DEDOPPLER] coarse_channel_size: 131072
BLADE [INFO]  | [M::DEDOPPLER] num_coarse_channels: 2
BLADE [INFO]  | [M::DEDOPPLER] Dimensions [A, F, T, P]: [2, 262144, 16, 1] -> N/A
BLADE [INFO]  | [M::DEDOPPLER] Coarse Channel Rate: 131072
BLADE [INFO]  | [M::DEDOPPLER] Channel Bandwidth: -7.62939453125 Hz
BLADE [INFO]  | [M::DEDOPPLER] Channel Timespan: 0.131072 s
.
.
.
BLADE [DEBUG] | [M::DEDOPPLER] Hit: coarse channel = 0, index = 196612, snr = 91.99948, drift rate = 27.16358 (-7 bins)
terminate called after throwing an instance of 'kj::ExceptionImpl'
[Detaching after vfork from child process 3163127]
  what():  capnp/layout.c++:1188: failed: tried to allocate list with too many elements
stack: 7ffff7d7de75 7ffff7d86306 7ffff7d7df84 7ffff7d70357 7ffff7ca461e 7ffff7ca418b 7ffff7d67b28 7ffff7d65aa3 7ffff7c928da 7ffff7b6339f 5555556684f7 555555650549 55555568e186 555555682f40 55555567248b 55555567194e 5555556672c4 5555556528bd 555555637c41 5555555df3f7 5555555df79b 5555555e0fae 5555555a6b4c 7ffff7149082 5555555a555d
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here
    ??:0: returning here

Thread 1 "blade-cli" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) backtrace
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7147859 in __GI_abort () at abort.c:79
#2  0x00007ffff73d0911 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff73dc38c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff73dc3f7 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff73dc6a9 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff7db2a41 in kj::ExceptionCallback::RootExceptionCallback::onFatalException (this=0x555557a3e6f0, exception=...)
    at ../subprojects/seticore/subprojects/capnproto/c++/src/kj/exception.c++:1044
#7  0x00007ffff7db0fd1 in kj::throwFatalException (exception=..., ignoreCount=1)
    at ../subprojects/seticore/subprojects/capnproto/c++/src/kj/exception.c++:1131
#8  0x00007ffff7dab019 in kj::_::Debug::Fault::fatal (this=0x7fffffffbcc0) at ../subprojects/seticore/subprojects/capnproto/c++/src/kj/debug.c++:371
#9  0x00007ffff7d7de76 in capnp::_::WireHelpers::initListPointer(capnp::_::WirePointer*, capnp::_::SegmentBuilder*, capnp::_::CapTableBuilder*, unsigned int, capnp::ElementSize, capnp::_::BuilderArena*)::{lambda()#1}::operator()() const (__closure=0x7fffffffbd80)
    at ../subprojects/seticore/subprojects/capnproto/c++/src/capnp/layout.c++:1188
#10 0x00007ffff7d86307 in capnp::assertMaxBits<29u, unsigned int, capnp::_::WireHelpers::initListPointer(capnp::_::WirePointer*, capnp::_::SegmentBuilder*, capnp::_::CapTableBuilder*, unsigned int, capnp::ElementSize, capnp::_::BuilderArena*)::{lambda()#1}>(capnp::_::WireHelpers::initListPointer(capnp::_::WirePointer*, capnp::_::SegmentBuilder*, capnp::_::CapTableBuilder*, unsigned int, capnp::ElementSize, capnp::_::BuilderArena*)::{lambda()#1}, capnp::_::WireHelpers::initListPointer(capnp::_::WirePointer*, capnp::_::SegmentBuilder*, capnp::_::CapTableBuilder*, unsigned int, capnp::ElementSize, capnp::_::BuilderArena*)::{lambda()#1}&&) (value=4293919392, func=...) at ../subprojects/seticore/subprojects/capnproto/c++/src/capnp/common.h:647
#11 0x00007ffff7d7df85 in capnp::_::WireHelpers::initListPointer (ref=0x555557fd65d8, segment=0x7fffffffc108, capTable=0x7fffffffc0e0, 
    elementCount=4293919392, elementSize=capnp::ElementSize::FOUR_BYTES, orphanArena=0x0)
    at ../subprojects/seticore/subprojects/capnproto/c++/src/capnp/layout.c++:1187
#12 0x00007ffff7d70358 in capnp::_::PointerBuilder::initList (this=0x7fffffffbe60, elementSize=capnp::ElementSize::FOUR_BYTES, elementCount=4293919392)
    at ../subprojects/seticore/subprojects/capnproto/c++/src/capnp/layout.c++:2528
#13 0x00007ffff7ca461f in capnp::List<float, (capnp::Kind)0>::initPointer (builder=..., size=4293919392)
    at ../subprojects/seticore/subprojects/capnproto/c++/src/capnp/list.h:181
#14 0x00007ffff7ca418c in capnp::_::PointerHelpers<capnp::List<float, (capnp::Kind)0>, (capnp::Kind)6>::init (builder=..., size=4293919392)
    at ../subprojects/seticore/subprojects/capnproto/c++/src/capnp/pointer-helpers.h:92
#15 0x00007ffff7d67b29 in Filterbank::Builder::initData (this=0x7fffffffbfe0, size=4293919392) at ../subprojects/seticore/hit.capnp.h:768
#16 0x00007ffff7d65aa4 in HitFileWriter::recordHit (this=0x555556f46f30, dedoppler_hit=..., input=0x7ffed6000000)
    at ../subprojects/seticore/hit_file_writer.cpp:78
#17 0x00007ffff7c928db in Blade::Modules::Seticore::Dedoppler::process (this=0x555556f42180, stream=@0x555556f3f6a8: 0x555556f3f670)
    at ../src/modules/seticore/dedoppler.cc:116
#18 0x00007ffff7b633a0 in Blade::Pipeline::compute (this=0x555556f3f690) at ../src/pipeline.cc:103
#19 0x00005555556684f8 in Blade::Plan::Compute<Blade::Pipelines::Generic::ModeS<(Blade::Pipelines::Generic::HitsFormat)2> > (pipeline=...)
    at ../include/blade/plan.hh:69
#20 0x000055555565054a in Blade::CLI::Telescopes::ATA::ModeBS<std::complex<signed char>, float>(Blade::CLI::Config const&)::{lambda(auto:1&)#4}::operator()<Bl--Type <RET> for more, q to quit, c to continue without paging--
ade::Pipelines::Generic::ModeS<(Blade::Pipelines::HitsFormat)2> >(Blade::Pipelines::Generic::ModeS<(Blade::Pipelines::HitsFormat)2>&) const (
    this=0x555557a407b0, worker=...) at ../apps/blade-cli/include/blade-cli/telescopes/ata/mode_bs.hh:335
#21 0x000055555568e187 in std::__invoke_impl<unsigned long, Blade::CLI::Telescopes::ATA::ModeBS<std::complex<signed char>, float>(Blade::CLI::Config const&)::{lambda(auto:1&)#4}&, Blade::Pipelines::Generic::ModeS<(Blade::Pipelines::HitsFormat)2>&>(std::__invoke_other, Blade::CLI::Telescopes::ATA::ModeBS<std::complex<signed char>, float>(Blade::CLI::Config const&)::{lambda(auto:1&)#4}&, Blade::Pipelines::Generic::ModeS<(Blade::Pipelines::HitsFormat)2>&) (__f=...)
    at /usr/include/c++/10/bits/invoke.h:60
#22 0x0000555555682f41 in std::__invoke_r<unsigned long const, Blade::CLI::Telescopes::ATA::ModeBS<std::complex<signed char>, float>(Blade::CLI::Config const&)::{lambda(auto:1&)#4}&, Blade::Pipelines::Generic::ModeS<(Blade::Pipelines::HitsFormat)2>&>(std::enable_if&&, (Blade::CLI::Telescopes::ATA::ModeBS<std::complex<signed char>, float>(Blade::CLI::Config const&)::{lambda(auto:1&)#4}&)...) (__fn=...) at /usr/include/c++/10/bits/invoke.h:113
#23 0x000055555567248c in std::_Function_handler<unsigned long const (Blade::Pipelines::Generic::ModeS<(Blade::Pipelines::Generic::HitsFormat)2>&), Blade::CLI::Telescopes::ATA::ModeBS<std::complex<signed char>, float>(Blade::CLI::Config const&)::{lambda(auto:1&)#4}>::_M_invoke(std::_Any_data const&, Blade::Pipelines::Generic::ModeS<(Blade::Pipelines::Generic::HitsFormat)2>&) (__functor=..., __args#0=...) at /usr/include/c++/10/bits/std_function.h:291
#24 0x000055555567194f in std::function<unsigned long const (Blade::Pipelines::Generic::ModeS<(Blade::Pipelines::Generic::HitsFormat)2>&)>::operator()(Blade::Pipelines::Generic::ModeS<(Blade::Pipelines::Generic::HitsFormat)2>&) const (this=0x7fffffffd020, __args#0=...)
    at /usr/include/c++/10/bits/std_function.h:622
#25 0x00005555556672c5 in Blade::Runner<Blade::Pipelines::Generic::ModeS<(Blade::Pipelines::Generic::HitsFormat)2> >::enqueue(std::function<unsigned long const (Blade::Pipelines::Generic::ModeS<(Blade::Pipelines::Generic::HitsFormat)2>&)> const&) (this=0x555556f26300, jobFunc=...) at ../include/blade/runner.hh:87
#26 0x00005555556528be in Blade::CLI::Telescopes::ATA::ModeBS<std::complex<signed char>, float> (config=...)
    at ../apps/blade-cli/include/blade-cli/telescopes/ata/mode_bs.hh:311
#27 0x0000555555637c42 in Blade::CLI::Telescopes::ATA::Setup (config=...) at ../apps/blade-cli/src/telescopes/ata.cc:28
#28 0x00005555555df3f8 in Blade::CLI::SetupTelescope (config=...) at ../apps/blade-cli/src/base.cc:12
#29 0x00005555555df79c in Blade::CLI::SetupProcessingPipeline (config=...) at ../apps/blade-cli/src/base.cc:31
#30 0x00005555555e0faf in Blade::CLI::Start (argc=20, argv=0x7fffffffdbe8) at ../apps/blade-cli/src/base.cc:129
#31 0x00005555555a6b4d in main (argc=20, argv=0x7fffffffdbe8) at ../apps/blade-cli/main.cc:5
radonnachie commented 1 year ago

I executed BLADE with 1 coarse channel at a time, but lied in the metadata about the coarse_channel_size being half as large as it is and so also lied about the num_coarse_channels being 2 instead of 1... I saw the following. Note that only when a hit is past 131072 does the failure happen...

BLADE [INFO]  | [M::DEDOPPLER] num_timesteps: 16
BLADE [INFO]  | [M::DEDOPPLER] num_channels: 262144
BLADE [INFO]  | [M::DEDOPPLER] coarse_channel_size: 131072
BLADE [INFO]  | [M::DEDOPPLER] num_coarse_channels: 2
BLADE [INFO]  | [M::DEDOPPLER] Dimensions [A, F, T, P]: [2, 262144, 16, 1] -> N/A
BLADE [INFO]  | [M::DEDOPPLER] Coarse Channel Rate: 262144
BLADE [INFO]  | [M::DEDOPPLER] Channel Bandwidth: -3.814697265625 Hz
BLADE [INFO]  | [M::DEDOPPLER] Channel Timespan: 0.262144 s
.
.
.
BLADE [DEBUG] | [M::DEDOPPLER] Hit: coarse channel = 0, index = 131047, snr = 96.33874, drift rate = 4.85064 (-5 bins)
BLADE [DEBUG] | [M::DEDOPPLER] Hit: coarse channel = 0, index = 131047, snr = 96.33854, drift rate = 4.85064 (-5 bins)
BLADE [DEBUG] | [M::DEDOPPLER] Hit: coarse channel = 0, index = 131156, snr = 75.57085, drift rate = 2.91038 (-3 bins)
terminate called after throwing an instance of 'kj::ExceptionImpl'
[Detaching after vfork from child process 3164492]
  what():  capnp/layout.c++:1188: failed: tried to allocate list with too many elements
stack: 7ffff7d7de79 7ffff7d8630a 7ffff7d7df88 7ffff7d7035b 7ffff7ca4622 7ffff7ca418f 7ffff7d67b2c 7ffff7d65aa7 7ffff7c928de 7ffff7b6339f 5555556684f7 555555650549 55555568e186 555555682f40 55555567248b 55555567194e 5555556672c4 5555556528bd 555555637c41 5555555df3f7 5555555df79b 5555555e0fae 5555555a6b4c 7ffff7149082 5555555a555d
    ??:0: returning here
radonnachie commented 1 year ago

This is also the case in the first run where there were actually 2 coarse channels, a hit that is in not in the first coarse channel leads to an out-of-bounds index.

lacker commented 1 year ago

Hmm, this sounds like there's some confusion about what the various parameters mean. What function in seticore are you calling that leads to this error, and what are the arguments you're calling it with? The "search" function expects to only be called on precisely one coarse channel at a time, and the hit file writer expects to be called with the same filterbank metadata that the search function was called with.

radonnachie commented 1 year ago

Oh, that sounds like it'd contend with how I'm using it, for starters I don't search only 1 coarse-channel at a time. I anticipated that the search would work off of num_channels not distinguishing between coarse-channel boundaries.

Furthermore, the hits-file writer only takes a float*, not a buffer. So it feels like everything is in place to actually have these 2 be agnostic to coarse-channel boundaries... I'm sure we'd only need to tweak a few lines...

lacker commented 1 year ago

The typical normalization to calculate SNR is per coarse channel, ie per the entire search input, so if you just call dedoppler search on multiple coarse channels at once it will be normalizing differently. I'm not sure if that's a big problem though.

radonnachie commented 1 year ago

Assuming the single coarse channel restriction, I still have the use of the hits-writer taking the full (multi beam) databuffer. It works (no segfaults) but I wonder if some of the odd hits data I am seeing due to my different implmentation.

Instead of what I currently have:

    this->output.hits.clear();
    const auto inputDims = this->input.buf.dims();
    const auto beamByteStride = this->input.buf.size() / inputDims.numberOfAspects();

    BL_CHECK(Memory::Copy(this->buf, this->input.buf, stream));

    const auto skipLastBeam = this->config.lastBeamIsIncoherent & (!this->config.searchIncoherentBeam);
    const auto beamsToSearch = inputDims.numberOfAspects() - (skipLastBeam ? 1 : 0);

    for (U64 beam = 0; beam < beamsToSearch; beam++) {
        FilterbankBuffer filterbankBuffer = FilterbankBuffer(
            inputDims.numberOfTimeSamples(),
            inputDims.numberOfFrequencyChannels(),
            this->input.buf.data() + beam*beamByteStride
        );
        dedopplerer.search(
            filterbankBuffer,
            this->metadata,
            beam,
            this->input.coarseFrequencyChannelOffset[0],
            this->config.maximumDriftRate,
            this->config.minimumDriftRate,
            this->config.snrThreshold,
            &this->output.hits
        );
    }

    if (this->config.lastBeamIsIncoherent) {
        FilterbankBuffer filterbankBuffer = FilterbankBuffer(
            inputDims.numberOfTimeSamples(),
            inputDims.numberOfFrequencyChannels(),
            this->input.buf.data() + (inputDims.numberOfAspects()-1)*beamByteStride
        );

        dedopplerer.addIncoherentPower(filterbankBuffer, this->output.hits);
    }

    BL_CUDA_CHECK(cudaStreamSynchronize(stream), [&]{
        BL_FATAL("Failed to synchronize stream: {}", err);
    });

    for (const DedopplerHit& hit : this->output.hits) {
        hit_recorder->recordHit(hit, this->buf.data());
    }

    return Result::SUCCESS;

you'd say the following is better (more in line with expected implementation)

    this->output.hits.clear();
    const auto inputDims = this->input.buf.dims();
    const auto beamByteStride = this->input.buf.size() / inputDims.numberOfAspects();

    BL_CHECK(Memory::Copy(this->buf, this->input.buf, stream));

    const auto skipLastBeam = this->config.lastBeamIsIncoherent & (!this->config.searchIncoherentBeam);
    const auto beamsToSearch = inputDims.numberOfAspects() - (skipLastBeam ? 1 : 0);

    std::vector<DedopplerHit> beamhits;
    FilterbankBuffer incohFilterbankBuffer = FilterbankBuffer(
        inputDims.numberOfTimeSamples(),
        inputDims.numberOfFrequencyChannels(),
        this->input.buf.data() + (inputDims.numberOfAspects()-1)*beamByteStride
    );

    for (U64 beam = 0; beam < beamsToSearch; beam++) {
        FilterbankBuffer beamFilterbankBuffer = FilterbankBuffer(
            inputDims.numberOfTimeSamples(),
            inputDims.numberOfFrequencyChannels(),
            this->input.buf.data() + beam*beamByteStride
        );
        dedopplerer.search(
            beamFilterbankBuffer,
            this->metadata,
            beam,
            this->input.coarseFrequencyChannelOffset[0],
            this->config.maximumDriftRate,
            this->config.minimumDriftRate,
            this->config.snrThreshold,
            &beamhits
        );

        if (this->config.lastBeamIsIncoherent) {   
            dedopplerer.addIncoherentPower(incohFilterbankBuffer, beamhits);
        }
        for (const DedopplerHit& hit : beamhits) {
            hit_recorder->recordHit(hit, beamFilterbankBuffer.data);
            this->output.hits.push_back(hit);
        }
        beamhits.clear()
    }

    BL_CUDA_CHECK(cudaStreamSynchronize(stream), [&]{
        BL_FATAL("Failed to synchronize stream: {}", err);
    });

    return Result::SUCCESS;

To handle coarse channels, I would just nest another for loop... 🤷

radonnachie commented 1 year ago

The above segfaults in the hit-recorder..

lacker commented 1 year ago

Is beamFilterbankBuffer a contiguous array? I'm a little curious that you're adding something named "byteStride" but it's a float pointer, right? The usage of the hit file writer is pretty simple, you just pass it the same input that the search function got. Sample usage here is pretty straightforward to understand - https://github.com/lacker/seticore/blob/c2c10e521d7d25233a17d02e677aa05c861c0a12/run_dedoppler.cpp#L43

lacker commented 1 year ago

Closing as inactive