ratt-ru / CubiCal

A fast radio interferometric calibration suite.
GNU General Public License v2.0
18 stars 13 forks source link

make FitsBeamSourceProvider a singleton, or at least close() it #398

Closed o-smirnov closed 4 years ago

o-smirnov commented 4 years ago

Look like montblanc predict is not closing files after itself perhaps?

INFO      14:32:59 - main               [1.6/166.1 9.3/703.8 1.1Gb] waiting for I/O on tile 122/302          
...

  File "/home/oms/.venv/cc/lib/python3.6/site-packages/astropy/io/fits/util.py", line 396, in fileobj_open                                             
    return open(filename, mode, buffering=0)                                                                                                           
OSError: [Errno 24] Too many open files: 'JVLA-beams/cassbeam/JVLA-S-centred-phased-lr_im.fits'     

The command-line was

gocubical cyg.parset --log-verbose solver=1 --out-name cc-a --data-ms MS/CygA-S-A-HI.MS/
o-smirnov commented 4 years ago

I'll rerun with increased ulimits and take a look at what files the I/O worker keeps open.

o-smirnov commented 4 years ago

Yeah, looks like it's not closing the beam files (or CubiCal is not deleting the objects that hold onto the beam files). Here's what things look like after a few tiles:

oms@simon:~/projects/VLA-CygA$ ls -l /proc/66746/fd
total 0
lrwx------ 1 oms oms 64 Jul 14 14:40 0 -> /dev/pts/3
lrwx------ 1 oms oms 64 Jul 14 14:40 1 -> /dev/pts/3
lrwx------ 1 oms oms 64 Jul 14 14:40 10 -> /home/oms/projects/VLA-CygA/MS/CygA-S-A-HI.MS/table.f14
lrwx------ 1 oms oms 64 Jul 14 14:40 11 -> /home/oms/projects/VLA-CygA/MS/CygA-S-A-HI.MS/table.f15
lrwx------ 1 oms oms 64 Jul 14 14:40 12 -> /home/oms/projects/VLA-CygA/MS/CygA-S-A-HI.MS/table.f10
lr-x------ 1 oms oms 64 Jul 14 14:40 13 -> 'pipe:[160571425]'
l-wx------ 1 oms oms 64 Jul 14 14:40 14 -> 'pipe:[160571425]'
lr-x------ 1 oms oms 64 Jul 14 14:40 15 -> 'pipe:[160571426]'
l-wx------ 1 oms oms 64 Jul 14 14:40 16 -> 'pipe:[160571426]'
lr-x------ 1 oms oms 64 Jul 14 14:40 17 -> 'pipe:[160571427]'
l-wx------ 1 oms oms 64 Jul 14 14:40 18 -> 'pipe:[160571427]'
lr-x------ 1 oms oms 64 Jul 14 14:40 19 -> 'pipe:[160571428]'
lrwx------ 1 oms oms 64 Jul 14 14:40 2 -> /dev/pts/3
l-wx------ 1 oms oms 64 Jul 14 14:40 20 -> 'pipe:[160571428]'
lr-x------ 1 oms oms 64 Jul 14 14:40 21 -> /dev/null
l-wx------ 1 oms oms 64 Jul 14 14:40 22 -> 'pipe:[160571429]'
lrwx------ 1 oms oms 64 Jul 14 14:40 23 -> /home/oms/projects/VLA-CygA/MS/CygA-S-A-HI.MS/table.f17_TSM1
lrwx------ 1 oms oms 64 Jul 14 14:40 24 -> /home/oms/projects/VLA-CygA/MS/CygA-S-A-HI.MS/table.f21_TSM0
lrwx------ 1 oms oms 64 Jul 14 14:40 25 -> /home/oms/projects/VLA-CygA/MS/CygA-S-A-HI.MS/table.lock
lrwx------ 1 oms oms 64 Jul 14 14:40 26 -> /home/oms/projects/VLA-CygA/MS/CygA-S-A-HI.MS/table.f12
lrwx------ 1 oms oms 64 Jul 14 14:40 27 -> /home/oms/projects/VLA-CygA/MS/CygA-S-A-HI.MS/table.f6i
lrwx------ 1 oms oms 64 Jul 14 14:40 28 -> /home/oms/projects/VLA-CygA/MS/CygA-S-A-HI.MS/table.f19_TSM1
lrwx------ 1 oms oms 64 Jul 14 14:40 29 -> /home/oms/projects/VLA-CygA/MS/CygA-S-A-HI.MS/table.f26
l-wx------ 1 oms oms 64 Jul 14 14:40 3 -> /home/oms/projects/VLA-CygA/montblanc.log
lrwx------ 1 oms oms 64 Jul 14 14:40 30 -> /home/oms/projects/VLA-CygA/MS/CygA-S-A-HI.MS/table.f25_TSM1
lrwx------ 1 oms oms 64 Jul 14 14:40 31 -> /home/oms/projects/VLA-CygA/MS/CygA-S-A-HI.MS/table.f6
lrwx------ 1 oms oms 64 Jul 14 14:40 32 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-rr_re.fits
lrwx------ 1 oms oms 64 Jul 14 14:40 33 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-rr_im.fits
lrwx------ 1 oms oms 64 Jul 14 14:40 34 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-rl_re.fits
lrwx------ 1 oms oms 64 Jul 14 14:40 35 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-rl_im.fits
lrwx------ 1 oms oms 64 Jul 14 14:40 36 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-lr_re.fits
lrwx------ 1 oms oms 64 Jul 14 14:40 37 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-lr_im.fits
lrwx------ 1 oms oms 64 Jul 14 14:40 38 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-ll_re.fits
lrwx------ 1 oms oms 64 Jul 14 14:40 39 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-ll_im.fits
l-wx------ 1 oms oms 64 Jul 14 14:40 4 -> /home/oms/projects/VLA-CygA/cal0.cc-out/cc-a.log
lrwx------ 1 oms oms 64 Jul 14 14:40 40 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-rr_re.fits
lrwx------ 1 oms oms 64 Jul 14 14:40 41 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-rr_im.fits
lrwx------ 1 oms oms 64 Jul 14 14:40 42 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-rl_re.fits
lrwx------ 1 oms oms 64 Jul 14 14:40 43 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-rl_im.fits
lrwx------ 1 oms oms 64 Jul 14 14:40 44 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-lr_re.fits
lrwx------ 1 oms oms 64 Jul 14 14:40 45 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-lr_im.fits
lrwx------ 1 oms oms 64 Jul 14 14:40 46 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-ll_re.fits
lrwx------ 1 oms oms 64 Jul 14 14:40 47 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-ll_im.fits
lrwx------ 1 oms oms 64 Jul 14 14:41 48 -> /home/oms/projects/VLA-CygA/MS/CygA-S-A-HI.MS/table.f27_TSM1
lrwx------ 1 oms oms 64 Jul 14 14:41 49 -> /home/oms/projects/VLA-CygA/MS/CygA-S-A-HI.MS/table.f22_TSM1
l-wx------ 1 oms oms 64 Jul 14 14:40 5 -> /home/oms/projects/VLA-CygA/cal0.cc-out/cc-a-G-field_0-ddid_None.parmdb.tmp
lrwx------ 1 oms oms 64 Jul 14 14:41 50 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-rr_re.fits
lrwx------ 1 oms oms 64 Jul 14 14:41 51 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-rr_im.fits
lrwx------ 1 oms oms 64 Jul 14 14:41 52 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-rl_re.fits
lrwx------ 1 oms oms 64 Jul 14 14:41 53 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-rl_im.fits
lrwx------ 1 oms oms 64 Jul 14 14:41 54 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-lr_re.fits
lrwx------ 1 oms oms 64 Jul 14 14:41 55 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-lr_im.fits
lrwx------ 1 oms oms 64 Jul 14 14:41 56 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-ll_re.fits
lrwx------ 1 oms oms 64 Jul 14 14:41 57 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-ll_im.fits
lrwx------ 1 oms oms 64 Jul 14 14:41 58 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-rr_re.fits
lrwx------ 1 oms oms 64 Jul 14 14:41 59 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-rr_im.fits
l-wx------ 1 oms oms 64 Jul 14 14:40 6 -> /home/oms/projects/VLA-CygA/cal0.cc-out/cc-a-dE-field_0-ddid_None.parmdb.tmp
lrwx------ 1 oms oms 64 Jul 14 14:41 60 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-rl_re.fits
lrwx------ 1 oms oms 64 Jul 14 14:41 61 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-rl_im.fits
lrwx------ 1 oms oms 64 Jul 14 14:41 62 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-lr_re.fits
lrwx------ 1 oms oms 64 Jul 14 14:41 63 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-lr_im.fits
lrwx------ 1 oms oms 64 Jul 14 14:41 64 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-ll_re.fits
lrwx------ 1 oms oms 64 Jul 14 14:41 65 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-ll_im.fits
lrwx------ 1 oms oms 64 Jul 14 14:42 66 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-rr_re.fits
lrwx------ 1 oms oms 64 Jul 14 14:42 67 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-rr_im.fits
lrwx------ 1 oms oms 64 Jul 14 14:42 68 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-rl_re.fits
lrwx------ 1 oms oms 64 Jul 14 14:42 69 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-rl_im.fits
lrwx------ 1 oms oms 64 Jul 14 14:40 7 -> /home/oms/projects/VLA-CygA/MS/CygA-S-A-HI.MS/table.f5
lrwx------ 1 oms oms 64 Jul 14 14:42 70 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-lr_re.fits
lrwx------ 1 oms oms 64 Jul 14 14:42 71 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-lr_im.fits
lrwx------ 1 oms oms 64 Jul 14 14:42 72 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-ll_re.fits
lrwx------ 1 oms oms 64 Jul 14 14:42 73 -> /net/jake/vault-jake/oms/JVLA-beams/cassbeam/JVLA-S-centred-phased-ll_im.fits
l-wx------ 1 oms oms 64 Jul 14 14:40 8 -> /home/oms/projects/VLA-CygA/cal0.cc-out/cc-a-BBC-field_0-ddid_None.parmdb.tmp
lrwx------ 1 oms oms 64 Jul 14 14:40 9 -> /home/oms/projects/VLA-CygA/MS/CygA-S-A-HI.MS/table.f16
JSKenyon commented 4 years ago

Pinging @sjperkins, as these are opened inside the FitsBeamSourceProvider.

sjperkins commented 4 years ago

It seems that many FitsBeamSourceProvider's are created. Is this strictly necessary? If not, FitsBeamSourceProvider has a close() method:

https://github.com/ska-sa/montblanc/blob/master/montblanc/impl/rime/tensorflow/sources/fits_beam_source_provider.py#L408-L414

which can be called explicitly to close the associated files.

sjperkins commented 4 years ago

It might be worth explicitly calling close() on other SourceProviders too.

o-smirnov commented 4 years ago

We create it here: https://github.com/ratt-ru/CubiCal/blob/master/cubical/data_handler/ms_tile.py#L277, but never destroy it. I think this means there's one kept per every tile (and we don't destroy the tile objects...)

So yeah, we clearly need to close() it in MSTile.save().

Do we need to create multiple ones though, or can we reuse the same one? I think we can...

sjperkins commented 4 years ago

We create it here: https://github.com/ratt-ru/CubiCal/blob/master/cubical/data_handler/ms_tile.py#L277, but never destroy it. I think this means there's one kept per every tile (and we don't destroy the tile objects...)

So yeah, we clearly need to close() it in MSTile.save().

Do we need to create multiple ones though, or can we reuse the same one? I think we can...

Yes it can.

o-smirnov commented 4 years ago

Yep, it doesn't look to depend on anything in the tile per se. So it can be a singleton. I have renamed the issue accordingly.

The other providers do rely on tile-specific data. But they should still be closed after the tile has been processed.

JSKenyon commented 4 years ago

Closed via #402.