lacker / seticore

A high-performance implementation of some core SETI algorithms that can be included in other programs.
MIT License
3 stars 6 forks source link

Hits file incoherentPower is always zero #29

Open david-macmahon opened 1 year ago

david-macmahon commented 1 year ago

It seems like Signals in Hits have zero incoherentPower even though Signals in the corresponding Stamps have non-zero incoherentPower. Is it possible that the Hits are being output before the incoherentPower field is computed? It's possible that I'm reading the file incorrectly, but since the stamps have non-zero values I don't think I am doing anything wrong on the reading side.

A recent example can be found in the hits and stamps files of 20230620T123329Z-20230618-0010 at MeerKAT.

lacker commented 1 year ago

Yes, that's right, the hits files don't have incoherent power. I updated the comment to at least document this behavior - https://github.com/lacker/seticore/commit/b17d54b549b68e4b69646f3f197f214e84b335eb

The problem is the way the data flow works, we don't have access to all this data at the same time for all hits. We only have hit.filterbank.data while we are processing the current beam, and we only figure out incoherent power while we are processing the incoherent beam. Stamp extraction uses a slower process that does have access to all this stuff. We might be able to batch the hits up in memory to add incoherent power, but, it isn't as straightforward as just adding a line of code to add in the incoherent power.

david-macmahon commented 1 year ago

I see, thanks. I guess even if we computed the incoherent beam first we'd still need to compute the incoherent power along the "drift path" of each hit, so you'd have to keep the incoherent beam data handy. That would be useful information to have for each hit if it could be managed.

lacker commented 1 year ago

We don't even have the entire incoherent beam calculated at the same time... there's three nested loops, like:

for each sub-band:
  for each beam:
    for each coarse channel

see: https://github.com/lacker/seticore/blob/master/beamforming_pipeline.cpp#L204

Most of the incoherent beam isn't even ever copied over to the CPU, we only grab out the little bits of data that we need.

Since we have most of our spare capacity on the CPU side, I am guessing it would be easier to keep the hits in memory and only output them once the incoherent beam data is available. Since we are already pushing that data through the GPU -> CPU bottleneck. We don't really have any limit on the number of hits, though, so it seems like in some case that would run out of memory, and we'd have to either handle that in some way or drop some the hits before outputting them. I'm not really sure if that would be an extremely rare case or not.