Open o-smirnov opened 4 years ago
@bennahugo wrong issue, wrong repo... wrong organization, even! ;)
Woops sorry about that this is why you should not look at github issues via mail at that point in the morning!
Valuable advice though! (Including the misdirected comment...)
@JSKenyon asked me for a reasonable MODEL_DATA column. I can generate a BDAd model in a separate MS with a separate run of xova bda, but then he has to copy that across into the original BDAd data MS somehow.
Is there an easy fix to run the averaging on more than one column at a time?
@o-smirnov The short of it is the codex dask/numba layer currently averages a single visibility array. I have considered averaging multiple vis arrays -- it'll require some extra numba + dask magic (support a tuple of vis arrays as an arg, vs a single vis array).
At least that is one mystery solved - I was very confused when I read your reply @bennahugo!
I'd say it's not a quick quick fix -- mediumish, but something that has been on my radar (along with lexicographical sort by MSv2 ID columns)
It's less than critical functionality so let's look for easy ways. How about you add an option to insert a new output column into an existing MS instead, would that be easier?
Or is it easier to just write a bit of code that copies odd-shaped columns across?..
I am a little confused though - you already average multiple columns @sjperkins. At least I assume so, as you produce an MS with multiple columns. The MODEL_DATA column will be averaged identically to the DATA column (at least in the sane world where they have the same dimensions).
It's less than critical functionality so let's look for easy ways. How about you add an option to insert a new output column into an existing MS instead, would that be easier?
Or is it easier to just write a bit of code that copies odd-shaped columns across?..
Do you have a Tigger sky model for the data? I could maybe try predict on the fly. No idea if it will work but worth trying.
Also worth testing - at some vague point in the future - the repercussions of using an averaged model vs. repredicting at the BDA coordinates.
Also worth testing - at some vague point in the future - the repercussions of using an averaged model vs. repredicting at the BDA coordinates.
Yes, very interested in this, /cc @paoloserra
No, I only have WSCLEAN images at the moment, no LSMs. I'd like to give you the best possible model for now (obtained through selfcal without BDA), so that you really test the quality of calibration first, and not the model...
The issue is that WSCLEAN can't work with a BDAd MS for now, so I must predict this image at full res, then average. The averaging is now in progress. I'll give you a separate MS with the BDAd model, you'll have to copy the column across.
Then of course there is also the model that we get from DDFacet. We could predict that to a BDAd MS. Now that really starts testing the whole BDA+DDF+selfcal loop. Which is too many variables to start with, IMHO.
Yeah - that sounds like too many points of failure. I will be happy with just producing reasonable solutions. As I mentioned, still have some changes to make but I think that internals of the solver are "correct". Put it this way - if I set both the model to be the DATA column, I get identity gains out. Not very meaningful, but at least suggests that something is going right.
@o-smirnov @JSKenyon Can you hold off till end of day? I might be able to do something with a day of "meetings" ;-)
Oh no worries - copying columns over etc. is fine for now anyway. Think it is just a feature that should exist eventually but it doesn't have to be immediate.
I also think there are too many points of failure in re-predicting. We still have to verify the astrometry of the whole thing here, ie. is TIME_CENTROID correct enough to get good uvw coordinates via fixvis first. Once we established that the widefield astrometry is correct one can start thinking about getting the forward predict going for such a dataset. I'm busy trying to establish this.
I am a little confused though - you already average multiple columns @sjperkins. At least I assume so, as you produce an MS with multiple columns. The MODEL_DATA column will be averaged identically to the DATA column (at least in the sane world where they have the same dimensions).
I do? Then I've managed to surprise myself. It should only average one column. Maybe the others are zeroed....
I guess I didn't check the contents. :-) But I guess I also just use conventional columns which you may handle differently.
Also worth testing - at some vague point in the future - the repercussions of using an averaged model vs. repredicting at the BDA coordinates.
Yes, very interested in this, /cc @paoloserra
Although I guess averaging may be less useful in the spectral line scenario.
Currently, xova averages the column specified in the --data--column option (default CORRECTED_DATA) into DATA in the new MS.
I've added functionality in codex to handle averaging mulltple columns at once: https://github.com/ska-sa/codex-africanus/pull/173/commits/f2ddbcd9a8c0622b743852020cb131e3495c7797
I want to propose that the --data-column changes to "DATA, MODEL_DATA, CORRECTED_DATA" and that each column listed is averaged directly into it's counterpart in the output.
Are there any objections to the above change?
I have no objections. Are you using argparse? If so you can use nargs=+
which will make the input a list. That shouldn't break existing code, but will allow for --data-column DATA MODEL_DATA CORRECTED_DATA
to be specified on the command line (as a space separated list in this case).
May I suggest an --output-column
option too, to override the default mapping of a column to its counterpart.
Reason being, it is a common and convenient use case to take a 1GC MS, and average it down, with CORRECTED_DATA becoming DATA in the output MS -- which can then go straight to selfcal.
...or, perhaps more succinctly, allow for --data-column DATA:CORRECTED_DATA
. The colon and the second name being optional.
...or, perhaps more succinctly, allow for
--data-column DATA:CORRECTED_DATA
. The colon and the second name being optional.
I like this version. Is our command line mini-language or conventions formalised anywhere? It would be nice to standardise it across applications. I'd imagine shadems and quartical are currently the main drivers but it's now being requested in applications like xova and tricolour.
I don't think this is really formalized, but colon is the obvious and sensible choice -- it's not arithmetic, and it's not a list separator, and it's Pythonic in the sense that it has the same meaning as in a dict definition...
@JSKenyon asked me for a reasonable MODEL_DATA column. I can generate a BDAd model in a separate MS with a separate run of xova bda, but then he has to copy that across into the original BDAd data MS somehow.
Is there an easy fix to run the averaging on more than one column at a time?