Open soumyajitmandal opened 7 years ago
I prefer it if you wouldn't copy&paste parts of (generic-)pipeline logfiles here (they are bleeping hard to read and half of it is usually missing), but attach the entire logfile to the post.
Indeed the smoothing fails, which is then the reason why most of the data gets flagged. (That's what NDPPP does it if encounters a NAN as a calibration value it is asked to apply to data.)
@rvweeren (or @darafferty): does the smooth_amps_spline.py
script implicitly assume that the data was taken during only one day? E.g. by assuming that the amplitudes can be modeled over the full time-range by a low-order polynomial?
@soumyajitmandal: You could try setting spline_smooth2D
to False
in the Factor parset.
I used spline_smooth2D = False
but the error still exists.
parmdbplot.py *4g.merge_amp_parmdbs1
looks fine.
Now the smooth_amps.py did not work out.
I ran this outside factor:
factor/scripts/smooth_amps.py 4g.merge_amp_parmdbs1 smooth_amp1_test
and
parmdbplot.py *smooth_amp1_test
shows amplitudes and phases are blank. The output messages were:
2.invalid value encountered in greated high_ind = numpy.where(amp > 5.0)
I would like to point out, few months ago, I tried factor for two different nights with 40 subbands and we fixed the problem if we had two different antennas , one being flagged (so essentially two different time span) #76
So I repeated the same task for the older run, and it did NOT fail giving also a norm_factor = 1.0011506
. The produced *smoth_amp1_test in this case is fine.
The only differences in these two times are: Full subband and 3 nights data.
The previous run was with different version of NDPPP than the recent one. So the two parmdbs1 were created with different lofar softwares. I think previously if the interpolation did not find a value, it used to put Zeros but now its putting NaNs instead. Is this the issue, probably?
I used
spline_smooth2D = False
but the error still exists.
But the error message you quoted is from smooth_amps_spline.py
, i.e. the spline smoothing script. So you should at least get a different error message.
I thought putting spline_smooth2D = False
turns off the use of smooth_amps_phases_spline.py
not the smooth_amps.py
, right? So the error am I got from the last run, was from smooth_amps.py
spline_smooth2D = False
turns off spline smoothing over the frequency axis. It still does a spline smooth across the time axis. (I think it it always uses smooth_amps_phases_spline.py
and smooth_amps.py
is not used anymore if I am correct)
@rvweeren: Ah, O.K.
When smoothing along the time axis, does the smooth_amps_spline.py
script implicitly assume that the data was taken during only one day? (E.g. by assuming that the amplitudes can be modeled over the full time-range by a low-order polynomial?)
I checked smooth_amps_spline.py
the script and it creates a time axis
times = numpy.copy(sorted( parms[key_names[0]]['times']))
Maybe it fails because of that and it cannot handle a very large gap (although looking at the code the spline does not directly use that time axis in the spline fit). The easiest way to debug this is to take the parmdb and run smooth_amps_spline.py manually on it and check where it fails in the script.
Hmm, update/correction, apparently it does use smooth_amps.py
if spline_smooth2d=False
if self.parset['calibration_specific']['spline_smooth2d']: smooth_amps_task = 'smooth_amps_spline' else: smooth_amps_task = 'smooth_amps'
(from facet_ops.py
)
It might be that the script is failing because there are NaN input values. Otherwise I cannot see why
high_ind = numpy.where(amp > 5.0)
could give an error message. I guess you need to open the scripts and do some debugging here and figure out precisely where it goes wrong, in smooth_amps_spline.py
and smooth_amps.py
Give it a try and see how far you can get, the scripts are not very complicated (if you still remain stuck provide the parmdb)
yeah indeed the error was with smooth_amps.py this time.
so I ran: smooth_amps.py
on the merged parmdb: smooth_amps.py 4g.merge_amp_parmdbs1 smooth_amp1_test
I did a print on 'ampl' after this line:
ampl_tot_copy = numpy.copy(ampl)
Where the values were NaNs. Whereas, in my successful run few months earlier, doing the same thing gives me Zeros.
You need get to get back further, ampl_tot_copy = numpy.copy(ampl)
is already too deep into the script.
The question for you to answer is (1) does the input parmdb contain NaNs and (2) is that the reason why it fails (becausesmooth_amps.py
is not NaN proof).
Check channel_parms_real
and channel_parms_imag
on line 125/126.
yes channel_parms_real
and channel_parms_imag
also have NaN values. So the input parmdb has NaN values. Whereas, earlier it used to have Zeroes.
Ok, so it looks like smooth_amps.py
is simply not NaN proof.
hmm okay. Is it a good idea to put zeros in place of NaNs? Or is it not a good solution?
You should try to edit smooth_amps.py so that it is NaN proof (with minimal other changes). Without having looked at it in detail I think that should not be very difficult to do.
(I probably do not have time to look at it myself over the next two weeks, after that I might have time to help with that and also check smooth_amps_spline.py
, because in the end it is preferable to use spline_smooth2d as is more capable in detecting amplitude outliers)
@soumyajitmandal: Can you put a parmDB with NANs somewhere where I can find it, to test the code?
@all: What should we do with the flagged data? Replacing the amplitudes with the median value is straight forward, but what should we do with the phases? Setting them to zero would be the most simple. Finding a useful median for phases is not only more complicated, but I also don't know if it is a good idea.
I did a test by putting zeroes instead of NaNs but normalisation is messed up in that process. Rather using a masked array might be useful ?
channel_parms_real = numpy.ma.masked_invalid(channel_parms_real)
but in this way, the median function might not work though.
Attached is the parmdb. lockman_amp_parmdbs1.zip
Including the masked array in a different part seems to be working so far in my case. I put spline_smooth2D = False
which means it is using smooth_amps.py
I changed (line 146):
amp = numpy.ma.masked_invalid(numpy.copy(numpy.sqrt(real2 + imag2))).compressed()
Previously while it was trying to create image31, it was failing since everything was flagged for the NaN entries and no norma_factor was found.
Now till image 42 has been created and parmdbs look fine as well.
Well, having had a look at the parmDB you attached here I think it would be important to find our why you have so many NANs in the parmDB. Did you flag large parts of the data? (And why would NDPPP create parmDB entries with NANs in that case, instead of not creating the entries at all.) Or are there parts of the data where NDPPP couldn't get a solution even if there was data?
Since there is a time gap between different nights, I thought its producing the NaNs. In general when I processed different nights data separately, I did not see the NaN issue.
Well, the smoothing is done on single time-series (i.e. separate for antenna, polarization, and channel), and several of these time-series are fully flagged.
Btw. here is a version of the script, that will not only work with NANs, but also doesn't produce the RuntimeWarnings: smooth_amps.py.txt
I will try this version, thanks a lot. I was trying with the temporary fix that I wrote in my previous comment (which I think you also put in the modified text file) and have an error. I reproduced the error outside the pipeline while using the convert_solutions_to_gain.py
convert_solutions_to_gain.py .pre-cal_chunk12_126407AFCt_4g.merge_phase_parmdbs .pre-cal_chunk12_126407AFCt_4g.smooth_amp2 gain_test
Traceback (most recent call last):
File "/net/para14/data1/mandal/software/factor_normalize/factor/scripts/convert_solutions_to_gain.py", line 165, in
Has anyone seen this earlier?
I found and fixed a problem in convert_solutions_to_gain.py
with parmdbs with large gaps. @soumyajitmandal, can you try your run again? (The fix in only available on the latest master, so if you want to use this with an earlier version of Factor, you'll need to copy the new version into your Factor installation by hand.)
Thanks David! I have tried this code last week after we chatted at the busy week so I do have the parmdbs created with the latest version. I will just do a git pull and rerun it.
Hi David,
it seems like convert_solutions_gain.py works now. Probably we need the same kind of fix in the reset_amps.py
? Thats where its failing now. I tried to run this outside factor and had a similar error:
reset_amps.py L340794_SB000_uv.dppp.pre-cal_126400A74t_121MHz.pre-cal_chunk12_126407AFCt_4g.convert_merged_selfcal_parmdbs test_parm
Traceback (most recent call last):
File "./reset_amps.py", line 77, in
I think this problem might be fixed by commit 4dc0157. To test it, update Factor, reset the state for the pipeline so that it repeats the convert_merged_selfcal_parmdbs
step, then rerun.
hmm this time I think I encountered a different error:
log4cplus:ERROR No appenders could be found for logger (CEP.ParmDB.EXCEPTION).
log4cplus:ERROR Please initialize the log4cplus system properly.
Traceback (most recent call last):
File "./reset_amps.py", line 77, in
I plotted the convert_merged_selfcal_parmdbs and the test_parmdb ( reset_amps.py convert_merged_selfcal_parmdbs test_parmdb
) The amplitude part for the test_parmdb is zero everywhere. Also, the test_parmdb* contains only 14 stations (CS301HBA0 being one of them)
I committed a fix to reset_amps.py
(similar to the one for convert_solutions_to_gain.py
) that might fix the above problem. Can you try it again? As before, you will need to reset the state for the pipeline so that it repeats the reset_amps
step, then rerun.
I just fixed a bug that affected both convert_solutions_to_gain.py
and reset_amps.py
, so unfortunately you'll need to update and reset to the convert_merged_selfcal_parmdbs
step.
Ah okay. I will do it and let you know. Thank you.
I tried it. It fails at the same place. I plotted the parmdbs and I think the error is a bit clearer. For one of the observations, one of the stations (CS301) was flagged.
parmdbplot.py *.convert_merged_selfcal_parmdbs
CS101: CS101.pdf
CS301: CS301.pdf
the time stamps are different for two different antennas. I have seen something similar in the smooth_amps
stage quite a while ago (#76). Something similar?
OK, I added a check for missing stations to reset_amps.py
, so update and give it another try.
Hi David, after the last update of yesterday on CEP3 I have this error
2017-05-09 11:23:18 ERROR facetselfcal_facet_patch_574: Failed pipeline run: facet_patch_574
2017-05-09 11:23:18 ERROR facetselfcal_facet_patch_574: Detailed exception information:
2017-05-09 11:23:18 ERROR facetselfcal_facet_patch_574: <class 'lofarpipe.support.lofarexceptions.PipelineRecipeFailed'>
2017-05-09 11:23:18 ERROR facetselfcal_facet_patch_574: convert_solutions_to_gain failed
I think it is related to the new commit.
The error seems to be:
2017-05-09 11:23:16 ERROR node.lof004.python_plugin: local variable 'gaps_ind' referenced before assignment
Thanks -- the 'gaps_ind' problem should be fixed now (and I updated it on CEP3).
Okay now I think it has passed the reset_amps.py
stage. By the way, is there a plotting issue with parmdbplot.py? I was checking parmdbplot.py *4g.create_preapply_parmdb
and the 'polar' plot shows the amplitude to be zero everywhere. Real and Imaginary part look fine.
Anyway, now its failing at the plotting solutions step. But its due to the size issue. here it is:
plot_selfcal_solutions.py -p L340794_SB000_uv.dppp.pre-cal_126400A74t_121MHz.pre-cal_chunk12_126407AFCt_4g.merge_selfcal_parmdbs hola
/software/rhel7/lib64/python2.7/site-packages/numpy/ma/core.py:852: RuntimeWarning: invalid value encountered in greater_equal
return umath.absolute(a) self.tolerance >= umath.absolute(b)
./plot_selfcal_solutions.py:43: RuntimeWarning: invalid value encountered in less
out[out < -np.pi] += 2.0 np.pi
./plot_selfcal_solutions.py:44: RuntimeWarning: invalid value encountered in greater
out[out > np.pi] -= 2.0 np.pi
Traceback (most recent call last):
File "./plot_selfcal_solutions.py", line 662, in
I've modified the plot_selfcal_solutions.py
script to work with multiple nights. It should now produce one plot per night (instead of all the nights in a single plot, which was unreadable).
okay. Probably its gonna work, but it says: global name 'fp' is not defined
Oops -- copy and paste error. Try it now.
plotting step passed. Plots look quite fine. now its preparing the imaging chunk dataset. I will keep updated. :)
Good news is the code seems bug free for multiple nights now! :)
Some issues (not related with code though I think). The image looks bad after combining three different nights, individual images look quite similar. I attach calibrator image from one of the nights and the same calibrator from multiple nights: 1night:
3nights:
Looks like amplitude and Tec had problems (?). Phases are fine.
Amplitude (night1):
Amplitude(night2):
TECscalarphase (night1):
TECscalarphase (night2):
How are the amplitudes applied when we merge different observations? I mean does it get averaged? Also, do you think its worth giving a try by flagging the noisy TEC solutions?
Factor doesn't do anything special with data from multiple nights -- they're handled just like data from a single night. So, it will smooth the amplitudes and normalize them in the same way (so a single normalization is done across all three nights). No averaging is done.
It's probably good to flag the periods during night 1 when the solutions are noisy (between hours 6-7 and after hour 8). I'm not sure whether they're the cause of the poor results, though.
Its very odd that the background noise essentially looks at the same level even though you have 3 times the data. I'd perhaps think the artefacts could stay similar but the noise should really good down a fair bit.
I guess the "sources" in the 3day image that pop up near your bright source are not real? Did you check the masks throughout the calibration of the 3 day one?
@darafferty I am also imaging them separately now (i.e: each facet has been gone through factor, I used a single model; added that to every night dataset; used gaincal-applycal. now imaging is running, by tomorrow I can see the result I hope).
@darafferty @twshimwell Lets wait till the facet is finished. I checked the full image in the facetselfcal directory (which only has 1/6th ob the bandwidth, is there an option to include all band? Factor does not have that settings anymore, its by default 6). In the facetimage directory (which is running now) it should have the whole BW. I am expecting the background noise might be a bit better than what we are seeing now.
@soumyajitmandal I was wondering: did you make sure that the same "InitSubtract
" model was subtracted from all three nights? E.g. by running Initial-Subtract
on all three nights together, or by subtracting the model of one night from the other two.
If you have different models subtracted from the different nights, then Factor will screw up. (It will just assume that the same model was subtracted from all data and thus treat two of the nights wrong.)
Each night should be init-subtracted separately right?
Let me explain what I did: I processed three different nights independently till the init-subtract step. So after init-subtraction step, for each night I have (24, 24, 23 in total 71) low2-model.merge
skymodels. In my Factor run, the msfiles directory contains all the 71 *.ms files and these low2-model.merge
files.
Each night should be init-subtracted separately right?
No! (Feel free to add a few more exclamation marks, blinking effects or so.)
Factor will use one model for all files in one frequency band. If you actually give it files for the same frequency band in which different models have been subtracted, then it will screw up. The way you started it, it will randomly(*) choose one of the three skymodels and use that for adding back the sources to the data from all three nights. So the two nights for which another model has been subtracted will get treated wrong.
My suggestion to fix that is to choose the model of one of the three nights, and subtract that from the other two nights.
(*) Well, not actually randomly, but it will be undefined behavior.
this is the behaviour even if the models are explicitly specified in the factor parset?
hi everyone,
I am combining 3 nights of data (full subband) with factor. It was going fine until the first amplitude calibration. Probably interpolating them in different time flagged all the data. Here is the message: