ratt-ru / tricolour

Holds an offline, MS direct version of the SDP online flagger.
Other
8 stars 4 forks source link

Chunking up large scans in memory #36

Open sjperkins opened 5 years ago

sjperkins commented 5 years ago

As it stands with the MS format I think there are two approaches to creating in-memory windows:

  1. Create the full resolution window for the scan in memory and pack chunks of the MS into the scan. Then parallelise over the window baselines.
  2. Read the entire MS scan into memory, pack per-baseline windows and flag each window separately.
  3. The previous options assume a fairly general MS. There is an optimal path, if we know that the MS is well-behaved by which I mean:

    • TIME monotically increases
    • all baselines are present for each unique TIME
    • baselines are ordered the same way for each TIME value

      Then, it should be possible to interleave baseline reads from different timesteps and avoid an entire scan in memory.

(1) is the current approach. Whichever way we go about it, we need to have an entire scan's worth of memory (in either MS or window format) in order to perform the packing in memory.

(2) might actually result in smaller chunks as we can chunk over both MS row and per-baseline window. But we'd still need all MS rows for the scan in memory as we don't really know which rows contribute to a window