Question: Best way to process a multi-beam observation?

ajstewart commented 7 years ago

I'm interested in attempting to use factor on some old data where the observation consists of 6 beams observed simultaneously. These 6 beams are overlapping with the aim to mosaic them together such that I end with a final image of a large patch of sky.

What is the best way to organise the data to do this? Do I process each beam individually and join the images at the end? Or is there a better way?

I've been reading the docs of both prefactor and factor and I was unclear on what the best method would be, or even if it really supports processing different directions - so I just wanted to clarify.

Of course being a multi-beam dataset this also means the frequency coverage is not continuous but rather 4 x 10 SBs of pre-dertmined bands. So the usual size bands but not covering the full frequency range but are instead spread out - will this also cause problems with factor? Pre-factor manages this ok.

AHorneffer commented 7 years ago

Hmmm... About Factor: my first inclination was to suggest to process them together. But I'm afraid that this might not work too well:

Before running Factor the same skymodel needs to be subtracted from all data of the same frequency. But a source that is strong in one beam and weak in another beam will either not be fully subtracted in one beam or leave a "hole" in the other beam.
The gain solutions in the parmDBs are only distinguished by station name, time, and frequency, but the data from two different beams will need to use different calibration solutions. (In particular the amplitude value to correct for the beam needs to be different.)
Also the code in Factor that sorts the MSs into groups that can be concatenated in frequency will run into problems. (It will group the data from all beams into one group.)

So if the 6 beams were indeed observed at the same time, then I'm afraid that you'll have to process them separately.

What you can do is to use the same directions for the different beams so that you get identical facet boundaries. Then you can combine the measurement-sets from the different beams (and thus Factor runs) for the common facets and image those together. (Not sure if that is useful, though.)

AHorneffer commented 7 years ago

About prefactor: If the different beams were taken at different times, each with their own calibrator observations, then you obviously need to process them separately. If they were taken at the same time, then you still should process the target beams separately: prefactor uses the same logic to group MSs into groups that can be concatenated in frequency that Factor does. And this code just doesn't consider the possibility that there might be different datasets at the same time and the same frequency.

You could run initial-subtract on all data together, but then you would run into the "hole or not subtracted" issue that I mentioned above.

ajstewart commented 7 years ago

Ok thanks for all that - I thought running separately was going to be the answer. And yes, I see what you mean with defining the same directions, but it still may be better to process separately and combine everything at the end I guess.

Playing around with one beam I encounter the following that suggests that perhaps the frequencies are going to make things difficult. There are only four bands of 10 SBs per beam (centred on 124, 149, 156, 185 - the 149 was intended to be 142, human error) but these are not evenly spaced to create the dummy ones in between as factor is attempting below. So is this likely to be a no go? Or is there a way to get around the dummy bands? As I said the data is years old so it was never planned with factor in mind. Perhaps it's possible to force only the four bands, but the lack of frequency coverage I would guess might cause problems further on.

I should say in prefactor I hard coded the banding structure into the frequency groups script to force the 4 bands of 10 SBs each, with no dummys.

From the first facet patch log:

2017-06-14 16:33:02 DEBUG   node.pi2.executable_args.L89717_SAP000_SB000_uv.MS_1221C7188t_184MHz.pre-cal_chunk0.average0]: /usr/bin/DPPP stdout: DEBUG - Assertion: freqs[0] > freq || near(freqs[0], freq, 1e-5); Subbands should be in increasing order of frequency; found 1.48924e+08, expected 1.49315e+08 (diff=-390625)

2017-06-14 16:33:02 WARNING node.pi2.executable_args.L89717_SAP000_SB000_uv.MS_1221C7188t_184MHz.pre-cal_chunk0.average0]: /usr/bin/DPPP stderr: MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
MeasurementSet dummy.ms not found; dummy data used
LOFAR Exception detected: [AssertError: Assertion: freqs[0] > freq || near(freqs[0], freq, 1e-5); Subbands should be in increasing order of frequency; found 1.48924e+08, expected 1.49315e+08 (diff=-390625)]
in function void LOFAR::DPPP::MultiMSReader::fillBands()
(/build/lofar-vKFVEC/lofar-2.20.2/CEP/DP3/DPPP/src/MultiMSReader.cc:184)
Backtrace follows:
#0  0x7f106dab7b7a in ?? at ??:0
#1  0x7f106dab8215 in ?? at ??:0
#2  0x7f106dab92cd in ?? at ??:0
#3  0x7f106da5086f in ?? at ??:0
#4  0x7f106da280b9 in ?? at ??:0
#5  0x7f106da2b305 in ?? at ??:0
#6  0x41c419 in ?? at ??:0
#7  0x7f106c7fc830 in ?? at ??:0
#8  0x41cb89 in ?? at ??:0

2017-06-14 16:33:02 DEBUG   node.pi2.executable_args.L89715_SAP000_SB000_uv.MS_1221C6E04t_184MHz.pre-cal_chunk0.average0]: /usr/bin/DPPP stdout: DEBUG - Assertion: freqs[0] > freq || near(freqs[0], freq, 1e-5); Subbands should be in increasing order of frequency; found 1.48924e+08, expected 1.49315e+08 (diff=-390625)

AHorneffer commented 7 years ago

You need to concatenate the the MSs in prefactor into bands that Factor can then concatenate together. You can also have partially filled bands. E.g. bands of 12 SBs (or so) where the first or last SBs are filled with dummy data.

What are the actual subbands that you observed? (Did you observe some of the subbands twice?)

ajstewart commented 7 years ago

The frequencies observed were (no repeats):

[  1.23046875e+08   1.23242188e+08   1.23437500e+08   1.23632812e+08
   1.23828125e+08   1.24023438e+08   1.24218750e+08   1.24414062e+08
   1.24609375e+08   1.24804688e+08   

   1.48046875e+08   1.48242188e+08   1.48437500e+08   1.48632812e+08
   1.48828125e+08   1.49023438e+08   1.49218750e+08   1.49414062e+08
   1.49609375e+08   1.49804688e+08

   1.55078125e+08   1.55273438e+08   1.55468750e+08   1.55664062e+08
   1.55859375e+08   1.56054688e+08   1.56250000e+08   1.56445312e+08
   1.56640625e+08   1.56835938e+08  

   1.83984375e+08   1.84179688e+08   1.84375000e+08   1.84570312e+08
   1.84765625e+08   1.84960938e+08   1.85156250e+08   1.85351562e+08
   1.85546875e+08   1.85742188e+08]

It's the two middle ones that are the problem as they are closer together because of the 149 mistake.

From playing around with it, to avoid having datasets where only 1 or 2 of the SBs are actually real, I could band into groups of 18. This keeps all the sub bands together, if you see what I mean.

If I combine 12 for example it gets a bit messy as it splits the two middle frequency ranges over a few banded datasets. The low and high bands are indeed kept in one band.

But perhaps not going higher than 12 is more advisable and it's ok to have datasets where there are lot of dummies? Does it count for the 'flagged data'? As then I'd be careful with the percentage flagged parameter.

ajstewart commented 7 years ago

Firstly I have also neglected to mention the snapshot nature of the data - it is 2 x 11 minute snapshots (which I imagine might also create issues).

I tried running factor with bands of 18 and got past the original error above.

Now I'm running into the issue below where it seems to be that 'all pixels in the image are blanked'. I have a feeling this time it might be something to do with the lack of data either in time or frequency.

When I attempt to open the MFS-image.fits in question in kvis I get the bad data notification and it fails to open.

Full log: https://www.dropbox.com/s/g0ef1z7tcgry5xx/facet_patch_167.out.log?dl=0

2017-06-16 17:01:23 DEBUG   facetselfcal_facet_patch_167.executable_args: compute.dispatch results job 0: job_duration: 1983.06047392, returncode: 1 
2017-06-16 17:01:25 DEBUG   facetselfcal_facet_patch_167.executable_args: Adding node_logging_information
2017-06-16 17:01:25 ERROR   facetselfcal_facet_patch_167.executable_args: A job has failed with returncode 1 and error_tolerance is not set. Bailing out!
2017-06-16 17:01:25 WARNING facetselfcal_facet_patch_167.executable_args: Note: recipe outputs are not complete
2017-06-16 17:01:25 WARNING facetselfcal_facet_patch_167.executable_args: recipe executable_args completed with errors
2017-06-16 17:01:25 WARNING facetselfcal_facet_patch_167: make_clean_mask reports failure (using executable_args recipe)
2017-06-16 17:01:25 ERROR   facetselfcal_facet_patch_167: *******************************************
2017-06-16 17:01:25 ERROR   facetselfcal_facet_patch_167: Failed pipeline run: facet_patch_167
2017-06-16 17:01:25 ERROR   facetselfcal_facet_patch_167: Detailed exception information:
2017-06-16 17:01:25 ERROR   facetselfcal_facet_patch_167: <class 'lofarpipe.support.lofarexceptions.PipelineRecipeFailed'>
2017-06-16 17:01:25 ERROR   facetselfcal_facet_patch_167: make_clean_mask failed
2017-06-16 17:01:25 ERROR   facetselfcal_facet_patch_167: *******************************************
2017-06-16 17:01:25 ERROR   facetselfcal_facet_patch_167: LOFAR Pipeline finished unsuccesfully.
2017-06-16 17:01:25 WARNING facetselfcal_facet_patch_167: recipe facetselfcal_facet_patch_167 completed with errors

2017-06-16 17:01:17 WARNING facetselfcal_facet_patch_167.executable_args: [1;34m--> Opened '/pi2raid/stewart/RSM-Processing/factor-2013-02-10/POINTING01-attempt2/results/facetselfcal/facet_patch_167/L89717_SAP000_SB000_uv.MS_1221C7188t_184MHz.pre-cal_chunk0.wsclean_image_full-MFS-image.fits.blanked'[0m
2017-06-16 17:01:22 WARNING facetselfcal_facet_patch_167.executable_args: Image size .............................. : (5760, 5760) pixels
2017-06-16 17:01:22 WARNING facetselfcal_facet_patch_167.executable_args: Number of channels ...................... : 1
2017-06-16 17:01:22 WARNING facetselfcal_facet_patch_167.executable_args: Number of Stokes parameters ............. : 1
2017-06-16 17:01:22 WARNING facetselfcal_facet_patch_167.executable_args: Beam shape (major, minor, pos angle) .... : (4.16667e-04, 4.16667e-04, -90.0) degrees
2017-06-16 17:01:22 WARNING facetselfcal_facet_patch_167.executable_args: Frequency of image ...................... : 154.588 MHz
2017-06-16 17:01:23 WARNING facetselfcal_facet_patch_167.executable_args: Number of blank pixels .................. : 33177600 (100.0%)
2017-06-16 17:01:23 WARNING facetselfcal_facet_patch_167.executable_args: [31;1mERROR[0m: All pixels in the image are blanked.
2017-06-16 17:01:23 DEBUG   facetselfcal_facet_patch_167.executable_args: Results for job 0 submitted by ('163.1.137.42', 54750)
2017-06-16 17:01:23 ERROR   node.pi2.python_plugin: All pixels in the image are blanked.
2017-06-16 17:01:23 INFO    node.pi2.python_plugin: Total time 1982.1212s; user time: 29.4840s; system time: 1953.2720s
2017-06-16 17:01:23 DEBUG   node.pi2.python_plugin: Start time was 1497626901.0959s; end time was 1497628883.2172s
2017-06-16 17:01:23 DEBUG   facetselfcal_facet_patch_167.executable_args: 
2017-06-16 17:01:23 WARNING facetselfcal_facet_patch_167.executable_args: 
2017-06-16 17:01:23 INFO    facetselfcal_facet_patch_167.executable_args: Subprocess completed with exit status 1: /bin/sh -c python /usr/lib/python2.7/dist-packages/lofarpipe/recipes/nodes/python_plugin.py 0 163.1.137.42 35010
2017-06-16 17:01:23 ERROR   facetselfcal_facet_patch_167.executable_args: Remote process python

darafferty commented 7 years ago

Usually this problem (fully blanked image) is due to the input data being completely flagged. Indeed, I see that DPPP is flagging a lot of data during apply:

Flags set by ApplyCal correct.
=======================

Percentage of visibilities flagged per baseline (antenna pair):
 ant    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14
   0        38%  90%  90%  81%  79%  84%  90%  88%  85%  79%  86%  87%  84%  88%
   1   38%       86%  91%  88%  86%  85%  88%  85%  83%  83%  80%  80%  83%  91%
   2   90%  86%       63%  84%  87%  82%  84%  82%  86%  90%  90%  88%  80%  78%
   3   90%  91%  63%       84%  89%  88%  86%  87%  92%  90%  86%  90%  77%  88%
   4   81%  88%  84%  84%       64%  85%  74%  89%  92%  86%  76%  92%  82%  73%
   5   79%  86%  87%  89%  64%       83%  87%  82%  91%  90%  95%  85%  79%  74%
   6   84%  85%  82%  88%  85%  83%       56%  80%  85%  91%  81%  93%  89%  94%
   7   90%  88%  84%  86%  74%  87%  56%       93%  86%  86%  76%  88%  83%  86%

When this happens, it's usually due to the solutions all being NaNs. Can you check one of the *convert_merged_selfcal_parmdbs to see if indeed they are all NaNs? There were a number of recent fixes related to the parmdbs of interleaved observations (see issue #194), so if you're running a version older than a month or so, try updating.

ajstewart commented 7 years ago

I've had a look and I only have one *convert_merged_selfcal_parmdbs in the facet results directory at the lowest band and this shows all amplitudes at 1.00 and phase at 0 when I plot it.

This was run on 1.2 so it may not have the fixes mentioned. I upgraded to 1.3 on Friday - I will perform a clean run on this version and see what I get.

darafferty commented 7 years ago

OK -- I believe the plotting script does plot ones for the amplitudes and zeros for the phase when NaNs are found, so I think we're on the right track. Oh, and there should indeed be just one *convert_merged_selfcal_parmdbs, as it contains the solutions for all the times and frequencies.

ajstewart commented 7 years ago

Unfortunately v1.3 gives the same error. Here's the log file but it's the same as before with very high flagging in the apply cal step.

I'm still working my way around factor but this seems to be in the prepare_imaging_data step? Though from what you said before this comes about due to a failed calibration previously?

darafferty commented 7 years ago

The problem seems to occur when the parmdbs are merged. I suspect it has something to do with the fact that you have only two short observations, but I'm not sure. Can you send me (or post somewhere) all the parmdbs (everything that matches *parmdb*) in the facetselfcal directory for this direction?

ajstewart commented 7 years ago

I've copied all *parmdb* from the facet results directory to my CEP3 home directory: /home/stewart/rsm-factor-parmdbs/. If it's easier to put them somewhere else then let me know. Thanks for your help!

darafferty commented 7 years ago

Well, it appears that the dTEC solve produced NaNs for all solutions. I see that it is solving in small frequency blocks (once per band), perhaps because the default TEC_block_MHz is too small given all the missing data. Can you try setting TEC_block_MHz (under [calibration]) to the full bandwidth (~60 MHz, I guess)? You'll need to delete the facetselfcal directory before restarting.

ajstewart commented 7 years ago

Yep thanks for that, changing the TEC_block_MHz to 64 MHz did get past that error.

So now everything seems to be running ok but resulting in a failed self calibration. I also tried it with the multires_selfcal option.

I'm guessing that while it seems to improve, it's not improving enough, or as expected, due to the limited data. I think it is checking the residuals? (From what I saw in https://github.com/lofar-astron/factor/issues/181). So yes perhaps the limited data is not able to reach some defined threshold?

selfcal_images_for_facet_patch_167_scaling_noise_ _19 156_mjy_beam

darafferty commented 7 years ago

Yes, it checks that the max peak residual after selfcal is below 0.75 Jy/beam. You can change this value in the last line of factor/pipeline/parsets/facetselfcal_pipeline.parset in your Factor installation and then rerun.

ajstewart commented 7 years ago

Thanks for pointing out where the limit is set.

I looked at the mapfiles of verify_subtract and it gives:

verify_subtract.maxvalpost.mapfile: [{'host': 'localhost', 'file': '0.8502', 'skip': False}]

verify_subtract.maxvalpre.mapfile: [{'host': 'localhost', 'file': '0.115182', 'skip': False}]

And setting to 0.95 (verify_subtract.argument.flags = [image_pre,image_post,0.95]) still results in a failed run. Is the formatting of these numbers significant?

darafferty commented 7 years ago

Ah -- it also checks that the residuals don't get a lot worse (which they did here). The question is why they are so much worse after selfcal, as it looks from your images that selfcal didn't go too badly. How do the solutions look?

AHorneffer commented 7 years ago

Did you have a look at the pre and post images that Factor made? (E.g. with key "v" in checkfactor.)

ajstewart commented 7 years ago

Ok yes looking at the verify images something doesn't seem quite right. The negative spot on the post image is quite strong, along with some structure being introduced in the image. Odd because as you say the self cal doesn't look too bad. Is it possible that imaging settings are not ideal for this amount of data?

Looking through the solution plots nothing jumps out at me immediately, though the amp plots are very flat (and at 1). But the short time scale of observations perhaps slightly flat would be expected.

Plots - https://www.dropbox.com/sh/rbil44a42rfon73/AAD97COHQXIA2Pgh2jbxzKt9a?dl=0

facet167-verify-images

lofar-astron / factor

Question: Best way to process a multi-beam observation? #210