lofar-astron / DP3

DP3: streaming processing pipeline for radio interferometric data
GNU General Public License v3.0
15 stars 10 forks source link

OOM for simple ddecal #337

Closed Joshuaalbert closed 2 years ago

Joshuaalbert commented 2 years ago

I'm running a ddecal step on a measurement set with a single timeslot and get an OOM in DPPP 4.1.

Questions

  1. Is ddecal well-supported still?
  2. Why would I get an OOM with a 500GB ms with 1TB ram?
  3. Do you have advice on how to overcome this OOM?

The command:

singularity exec -B /fastpool/albert /fastpool/albert/envs/lofar_sksp_ddf.simg DPPP \
numthreads=1 \
msin=W-8000-newcoords.ms \
msin.datacolumn=DATA \
msout=. \
steps=[ddecal] \
ddecal.type=ddecal \
ddecal.maxiter=100 \
ddecal.propagatesolutions=True \
ddecal.solint=1 \
ddecal.nchan=1 \
ddecal.h5parm=solutions.h5 \
ddecal.sourcedb=single-source.sourcedb

The MS is 508GB. There is 1TB RAM. The sky model is a single point source:

# (Name, Type, Patch, Ra, Dec, I, ReferenceFrequency='150.e6', SpectralIndex) = format

A, POINT, , 23:58:15.43335658, +35.14.14.96586417, 1.0

Parameters printed out before crashing with OOM in the logs:

MSReader
  input MS:       /fastpool/albert/root/ionosphere_single_dir_1/W-8000-newcoords.ms
  band            0
  startchan:      0  (0)
  nchan:          8000  (0)
  ncorrelations:  4
  nbaselines:     2096128
  ntimes:         1
  time interval:  60
  DATA column:    DATA
  WEIGHT column:  WEIGHT_SPECTRUM
  autoweight:     false
DDECal ddecal.
  H5Parm:              solutions.h5
  solint:              1
  nchan:               1
  directions:          [[A]]
  use model column:    false
  tolerance:           0.0001
  max iter:            100
  flag unconverged:    false
     diverged only:    false
  propagate solutions: true
       converged only: false
  detect stalling:     true
  step size:           0.2
  mode (constraints):  complexgain
  approximate fitter:  false
  only predict:        false
  subtract model:      false
Predict ddecal.
  sourcedb:           single-source.sourcedb
   number of patches: 1
   number of sources: 1
   all unpolarized:   true
  apply beam:         false
  operation:          replace
  threads:            1
MSUpdater msout.
  MS:             /fastpool/albert/root/ionosphere_single_dir_1/W-8000-newcoords.ms
  datacolumn:     DATA
  weightcolumn    WEIGHT_SPECTRUM
  Compressed:     no

  flush:          0
aroffringa commented 2 years ago

Hi Joshua.

  1. Yes, ddecal is well supported and part of a few production pipelines.
  2. Not sure -- as far as I can see, your command should be okay, so then you've probably hit some kind of bug. DDECal is tested reasonable well, but apparently something is different in your MS then in our test-cases.
  3. If you can provide a minimal test-case, we can try and fix the issue. Also, if you aren't yet, make sure you use the latest version of DP3 from https://git.astron.nl/RD/DP3.
Joshuaalbert commented 2 years ago

Thanks for the fast reply. I've asked my colleague Yuping to provide me the script used to create the MS, as this would be the easiest way to provide a minimal example.