lofar-astron / DP3

DP3: streaming processing pipeline for radio interferometric data
GNU General Public License v3.0
15 stars 10 forks source link

On the last <timewindow> time slots of flagging statistics are recorded #334

Open flomertens opened 2 years ago

flomertens commented 2 years ago

I use the following DPPP parset to pre-process NenuFAR data:

numthreads = 48

msin =
msin.datacolumn = DATA
msin.startchan = 2
msin.nchan = 60

msin.weightcolumn = WEIGHT
msout.writefullresflag  = false
msout.overwrite = true
msout.storagemanager = 'dysco'
msout.storagemanager.normalization = 'RF'

steps = [flag,avg]

flag.type = aoflagger
flag.memoryperc = 30
flag.strategy = /home/fmertens/pre_processing/Nenufar64C1S.lua

avg.type = average
avg.freqstep = 5
avg.timestep = 4

processing works fine, but I just realized now that only the last time slots of flagging statistics are written in the QUALITY table. The output for one processing is:

 copying info and subtables ...
Finished preparing output MS
MSReader
  input MS:       /data_NRI/nenufar-nri/ES00/2022/01/20220114_055900_20220114_103000_CASA_TRACKING_COSMIC_DAWN/L0/SB192.MS
  band            0
  startchan:      2  (2)
  nchan:          60  (60)
  ncorrelations:  4
  nbaselines:     3240
  first time:     2022/01/14/06:00:11
  last time:      2022/01/14/10:29:49
  ntimes:         16073
  time interval:  1.00663
  DATA column:    DATA
  WEIGHT column:  WEIGHT
  autoweight:     false
AOFlaggerStep flag.
  strategy:       /home/fmertens/pre_processing/Nenufar64C1S.lua
  timewindow:     5358
  overlap:        54
  keepstatistics: 1
  autocorr:       1
  max memory used 37.2 GB
Averager avg.
  freqstep:       5  timestep:       4
  minpoints:      1
  minperc:        0
MSWriter msout.
  output MS:      /data_NRI/nenufar-nri/ES00/2022/01/20220114_055900_20220114_103000_CASA_TRACKING_COSMIC_DAWN/L1/SB192.MS
  nchan:          12
  ncorrelations:  4
  nbaselines:     3240
  ntimes:         4019
  time interval:  4.02653
  DATA column:    DATA
  WEIGHT column:  WEIGHT_SPECTRUM
  Compressed:     yes
  Data bitrate:   10
  Weight bitrate: 12
  Dysco mode:     RF TruncatedGaussian(2.5)

Processing 16073 time slots ...

0%....10....20....30....40....50....60....70....80....90....100%
Finishing processing ...

NaN/infinite data flagged in reader
===================================

Percentage of flagged visibilities detected per correlation:
  [0,0,0,0] out of 3124591200 visibilities   [0%, 0%, 0%, 0%]
0 missing time slots were inserted

Flags set by AOFlaggerStep flag.
===========================

Percentage of visibilities flagged per baseline (antenna pair):
< fag stats>

Percentage of flagged visibilities detected per correlation:
  [106540252,106540252,106540252,106540252] out of 3124591200 visibilities   [3%, 3%, 3%, 3%]

Total NDPPP time    2550.22 real     30715.2 user      839.35 system
   37.6% MSReader
   45.8% AOFlaggerStep flag.
            3.6% of it spent in shuffling data
           64.4% of it spent in calculating flags
           27.4% of it spent in making quality statistics
   13.0% Averager avg.
    3.6% MSWriter msout.

  5357 time slots to finish in AOFlaggerStep ...

You see that the timewindow is 5358, smaller than the ntimes. And indeed when I inspect the QUALITY tables with aoqplot, I can only see the time statistics starting from 9am. Am I missing a parameter in the parset file maybe ?

aroffringa commented 2 years ago

I think you're right, seems like a bug that was introduced in 74ad5ad6e , #260. I'll prepare a fix...

aroffringa commented 2 years ago

Should be solved after https://git.astron.nl/RD/DP3/-/merge_requests/616 is merged.

It happens when not all data fits in memory at once. In that case, the statistics will indeed only reflect the data of the last chunk that fitted in memory. Once the MR is merged this should be fixed. Sorry for this, I hope you didn't lose statistics from high res data because of this.