Gaps between subbands and concatenation

nealjackson commented 4 years ago

Some datasets have gaps between recorded subbands bigger than the frequency width of the concatenated ILTJ....msdpppconcat files. Normally data gaps are not a problem because dummy subfrequencies are written into the msdpppconcat files, but if the gap is this big, then there is an entire missing ILTJ...msdpppconcat file, instead of it being present and containing entirely dummy data. This becomes a problem later on when the file is combined, the tec is applied and a conversion to FITS is attempted, since the conversion can't be done because the resultant file doesn't have equal frequency widths across the band.

lmorabit commented 4 years ago

I think I have fixed this with commit ee66cea4ae57310fa5e9c09976520d29705e14ae . Can you please test? Essentially this re-runs the sort script that will create mapfiles that fill in the dummy measurement sets for the concatenation step.

lmorabit commented 4 years ago

Hi Neal, is this tested successfully now?

nealjackson commented 4 years ago

Still having some problems. I had deleted the upper subbands in order to get the pipeline to work, so rather than reload everything from the archive I made a cut-down set of files with deliberately missing subbands. The frequency channels of both the raw data files and the combined ILTJ... files are listed in /home/njj/freq_test on lofar.herts.

The deliberately-left gap is between 163 and 165 MHz, and the msdpppconcat files produced are at 162, 166 and 168MHz on re-running the pipeline but just with the new LB-Delay-Calibration.parset. The new steps in the parset ran all the way through to the apply_tec (which crashed, I think because the new tec solution is going to be ropy with only a small number of subbands).

I hope this is a fair test but nb (i) I just used the new parset since I think this was all the changes in the commit and (ii) there is a gap in the numbering of the subbands, rather than the subband numbering being contiguous but with frequency gaps between them. If either of these are a problem I can reload the full dataset, but this will take rather longer.

tikk3r commented 4 years ago

(ii) there is a gap in the numbering of the subbands, rather than the subband numbering being contiguous but with frequency gaps between them.

This shouldn't be an issue, as prefactor's sorting script looks at the frequencies and channel width in the measurement sets to determine how to insert dummies.

lmorabit commented 4 years ago

@nealjackson where is your log? Failing on the TEC step doesn't mean that the combination didn't work properly, if you don't have enough data. The log should show whether dummy.ms files are being inserted into the frequency gaps properly.

lmorabit commented 4 years ago

Hi @nealjackson I found the log on the cluster. The dummy measurement sets have been properly inserted, but can you re-run this test with numSB = -1 rather than 999 in the sort_phaseupmap step please?

lmorabit commented 4 years ago

@nealjackson using numSB = -1 with truncateLastSBs = False properly inserts dummy.ms measurement sets but still has a trailing dummy.ms above the band at 168MHz (which should be the top). If numSB = -1 and truncateLastSBs = True doesn't have the final dummy.ms I'll update the parset to that, otherwise I think the behaviour with truncateLastSBs = False is acceptable.

nealjackson commented 4 years ago

I ran with numSB=-1 and truncateLastSBs = True in sort_phaseupmap but still end up with 162, 166 and 168MHz although the ILTJ... files themselves have dummy.ms inserted OK. From an immediate point of view I think it is not worth agonising over since for these data the top part of the band is to throw away anyway, so it's not a big problem losing 10-20 subbands. Maybe we should revisit this if anyone has a dataset with a big hole in the middle.

lmorabit commented 4 years ago

Hi @nealjackson I'm a little confused now by what the problem is. Are you saying the phaseup_concat step gives you more than one file? The naming of the bands created by the dpppconcat step comes directly from the frequencies of the measurement sets being combined.

nealjackson commented 4 years ago

I might also be confused! But the trip in the pipeline happens in the conversion to FITS in the selfcal, I think because although the ILTJ....msdpppconcat files have correct frequency structure, there is one ILTJ...msdpppconcat, that could in principle consist entirely of dummy.ms, which is missing (because no subbands have any of the frequencies which would be in that file).

lmorabit commented 4 years ago

are you feeding the ILTJ....msdpppconcat files into difmap, or the ILTJ....phaseup_concat?

nealjackson commented 4 years ago

According to the log, the selfcal part is fed the apply_tec file (together with the solutions.h5 from Pre-Facet-Target/results/cal_values).

lmorabit commented 4 years ago

@nealjackson yes, which is the TEC solutions applied to the phaseup_concat file, and they both should contain the whole bandwidth, with dummy data. Are you saying the selfcal is still failing? If so, can you try the phaseup_concat file? Perhaps the problem arises when NDPPP reads in the phaseup_concat file and writes out the apply_tec file.

nealjackson commented 4 years ago

Ah, OK, apologies. Yes, it now works. I'd assumed that the gap in the ILTJ...msdpppconcat files was a problem, but it is producing the apply_tec with the correct frequency structure, and I've just tried the selfcal_difmap again and it works. So problem solved I think.

lmorabit commented 4 years ago

Great! Do you have your log somewhere? I'll use it to decide between truncateLastSBs = True or False and then close the issue after implementing the changes.

nealjackson commented 4 years ago

Logs are on /home/njj on herts - 76608.uhhpc.log has the nsub -1 and truncate True.

lmorabit commented 4 years ago

fixed with commit 09c2eea311b449667862a15fc6ffb58b272ff93d

lmorabit / lofar-vlbi

Gaps between subbands and concatenation #66