Open IanHeywood opened 8 years ago
Yeah I've seen this happen with degenerate LSMs (ones with no time/freq axis), the internal optimizer collapses these axes and gets into trouble.
Simplest workaround is to include an LSM containing a single source of 1e-999 flux, with a spectral index.
Thanks chief I'll give it a go.
Problem persists having added this to the LSM:
I thought it might be ignoring the sources since 1e-999 gets rounded down to zero, but even 1e-9 makes no difference.
Weird. Could you please publish the results of the DT and MT nodes, and open them up to look at dimensions of the vellsets within?
Trying to publish things seems to be killing the meqserver, so I can't see any results propagating into the cache section. The dimensions of MT and DT in the request all appear to be the same.
calico-wsrt-tens.py is as usual happy to chew this problem slowly.
Hmmm, anywhere I can look at it live?
The MODEL_DATA column, it's got 4 correlations as usual?
Last thing to try, give Q=1e-99 to the dummy source. In fact I should have suggested from the beginning...
On Sat, Jul 2, 2016 at 5:39 AM, IanHeywood notifications@github.com wrote:
calico-wsrt-tens.py is as usual happy to chew this problem slowly.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ska-sa/meqtrees/issues/872#issuecomment-230081430, or mute the thread https://github.com/notifications/unsubscribe/AGK5vxvhA-Iek-aigcy44tmyhBQZNwVHks5qRd12gaJpZM4JBATu .
MODEL_DATA is the same shape as DATA and CORRECTED. It's just the output written by wsclean.
Added Q value but problem persists. Problem goes away if I only solve for G and disable dE.
I'll try to recreate it on a Rhodes machine. I won't be able to use the browser to test it but you might be able to from there.
Cheers.
On Elwood:
$ cd /home/ianh/Data/13B-308/dE_tests
$ python one_pass_cal.py
should reproduce the error.
I scp'd the MS from my local machine, and weirdly enough I had to apply the fix in meqtrees-cattery issue 34 as the script threw the
### 000: node 'VisDataMux': execute() failed: TableMeasRefDesc error: old refcode Undefined does not exist anymore (return code 0x810021)
which it wasn't doing on my local run. I've used ms.copycol to overwrite the DATA with the CORRECTED_DATA column prior to trying the stefcal run, but I wouldn't have thought that would be causing the trouble, particularly since calico-wsrt-tens.py is fine with it all.
Cheers.
OK I see the issue. Workaround is to set "Apply diffgain to selected sources" and "Sources: =dE". Or in the conf file:
de_subset.subset_enabled = 1
de_subset.source_subset = =dE
You had the subset set to "all", which caused it to treat the DUMMY source as one with a dE on it too, thus making for an empty set of sources without a dE. Due to a bug in calico-stefcal.py
, it does not fail gracefully in this situation.
It's running now, thanks!
Nice of you to fabricate a bug at the end there to make it look like it wasn't entirely user-error.
But it is a bug. User errors should result in at least mildly comprehensible error messages, which this one patently isn't... not sure I'll ever get around to fixing it since a workaround exists, but at least I'm keeping it filed as a bug.
OK, just to make life interesting... looping over SPWs using the same Tigger LSM + contents of MODEL_DATA to calibrate against:
SPW0: Fine
SPW1: Fine
SPW2: Fine
SPW3: Dimensions of tensor child do not match...
WTF...?
Deja vu... can has upload please?
On Elwood:
$ cd /home/ianh/Data/13B-308/dE_tests2/
$ python per_scan_per_band_calibration.py
This will try to calibrate the SPWs for the first block of scans sequentially, and will fail when it gets to SPW 3 (DATA_DESC_ID==3). If you edit the script so dryrun = True
then run it the terminal output should show you the steps it takes without actually running anything.
For the SPWs that run successfully it grumbles about diverging chi^2 values, but I haven't optimised anything yet, I'm just trying to get it to swallow the problem. You can see what my approach is by looking at the last few lines of setup_dE_model.py
, basically I'm trying to have wsclean take care of modelling everything except the problem sources, which are excluded from MODEL_DATA and replaced by a component model to which dEs are applied.
As an aside: when I'm looping over mqt.run invocations like this, is there a way to note a failure and move on, rather than having it just die and kill all subsequent runs? All I can think of is having one script spawning another. Doesn't seem very pretty, but as you know I'm not above that sort of shoddy behaviour.
Thanks again.
Having told it to skip SPW3 I note that it also fails on SPW14.
Any idea what might cause this error?
I'm using a VLA MS which has thus far behaved as expected. Trying to read in a pre-computed model from the MODEL_DATA column, and solve for dEs on a two-component LSM.