wgridder seems to assume a negated `w` coordinate

Joshuaalbert commented 2 months ago

The phase in the RIME is:

phase = (-2j * pi / lambda * (l * u + m * v + (n - 1) * w)

but I see wgridder's implemented with negated w

phase = (-2j * pi / lambda * (l * u + m * v - (n - 1) * w)

Is this intended, and if so why?

Joshuaalbert commented 2 months ago

To add more. I don't see a negated definition of w coordinate anywhere else. E.g. Thompson, Moran Swenson

Screenshot from 2024-05-30 22-12-36

Codex Africanus: https://github.com/ratt-ru/codex-africanus/blob/cbc64d77517503a4f1b1544242bd6623def1ad8f/africanus/rime/wsclean_predict.py#L31

mreineck commented 2 months ago

This discrepancy is unexpected, since, as fas as I know, Codex Africanus uses the wgridder as one of several backends. @landmanbester, do you have any insights on this?

landmanbester commented 2 months ago

I've not noticed that. It's tested against the explicit gridder here right?

mreineck commented 2 months ago

That's correct,

Interestingly, the sign of the w component seems fine to me in the explicit gridder ... it's just that the minus sign appears in a different (but for w equivalent) place than in the formula above. I'm more surprisd about the u and v parts, as tese seem to have the wrong sign ...

Joshuaalbert commented 2 months ago

It's a choice as to which Fourier convention is used, with 'casa' convention having with +2pi exponent. However, that's just equivalent to negating uvw and using the 'physical' convention of -2pi. But in all cases uvw should share the same sign, and follow the form of Thompson, Swenson, Moran above.

My unit predict unittests fail when compared to wgridder. It passes if I flip only the w coordinate. It's only noticable for large l,m coordinates. Perhaps a quick call soon to dig in together would be helpful.

Joshuaalbert commented 2 months ago

Also, it's quite simple to see. Wgridder is implemented with the equation in Phillip's paper, which you can compare to the above Thompson, Moran, Swenson (and codex Africanus code)

Screenshot_20240531-093010~2

o-smirnov commented 2 months ago

But in all cases uvw should share the same sign, and follow the form of Thompson, Swenson, Moran above.

I agree with that...

Joshuaalbert commented 2 months ago

Here's a unittest showing that w-gridder is "wrong". I hate to use the word wrong, of course. Initially, I thought that perhaps w-gridder just assumed some esoteric convention, but it looks like somehow the RIME mistakenly got mis-represented at some point during the design of w-gridder.

MVCE

This shows that negating the w-term is required to match the DFT degridder. The difference is only noticed for larger n(l,m), which the unittest below parametrises with center_offset. Note, the equation (1) in https://arxiv.org/pdf/2010.10122 does not match the actual RIME and is thus incorrect.

import itertools

import numpy as np
import pytest
from ducc0.wgridder import dirty2vis

@pytest.mark.parametrize("center_offset", [0.0, 0.1, 0.2])
@pytest.mark.parametrize("negate_w", [False, True])
def test_wrong_w(center_offset: float, negate_w: bool):
    np.random.seed(42)
    N = 512
    num_ants = 100
    num_freqs = 1

    pixsize = 0.5 * np.pi / 180 / 3600.  # 0.5 arcsec ~ 4 pixels / beam, so we'll avoid aliasing
    l0 = center_offset
    m0 = center_offset
    dl = pixsize
    dm = pixsize
    dirty = np.zeros((N, N))

    dirty[N // 2, N // 2] = 1.
    dirty[N // 3, N // 3] = 1.

    def pixel_to_lmn(xi, yi):
        l = l0 + (-N / 2 + xi) * dl
        m = m0 + (-N / 2 + yi) * dm
        n = np.sqrt(1. - l ** 2 - m ** 2)
        return np.asarray([l, m, n])

    lmn1 = pixel_to_lmn(N // 2, N // 2)
    lmn2 = pixel_to_lmn(N // 3, N // 3)

    antenna_1, antenna_2 = np.asarray(list(itertools.combinations(range(num_ants), 2))).T
    antennas = 10e3 * np.random.normal(size=(num_ants, 3))
    antennas[:, 2] *= 0.001
    uvw = antennas[antenna_2] - antennas[antenna_1]

    freqs = np.linspace(700e6, 2000e6, num_freqs)

    vis = dirty2vis(
        uvw=uvw,
        freq=freqs,
        dirty=dirty,
        wgt=None,
        pixsize_x=dl,
        pixsize_y=dm,
        center_x=l0,
        center_y=m0,
        epsilon=1e-4,
        do_wgridding=True,
        flip_v=False,
        divide_by_n=True,
        nthreads=1,
        verbosity=0,
    )

    lmn = [lmn1, lmn2]
    pixel_fluxes = [1., 1.]
    vis_explicit = explicit_degridder(uvw, freqs, lmn, pixel_fluxes, negate_w)

    np.testing.assert_allclose(vis.real, vis_explicit.real, atol=1e-4)
    np.testing.assert_allclose(vis.imag, vis_explicit.imag, atol=1e-4)

def explicit_degridder(uvw, freqs, lmn, pixel_fluxes, negate_w):
    vis = np.zeros((len(uvw), len(freqs)), dtype=np.complex128)
    c = 299792458.  # m/s

    for row, (u, v, w) in enumerate(uvw):
        if negate_w:
            w = -w
        for col, freq in enumerate(freqs):
            for flux, (l, m, n) in zip(pixel_fluxes, lmn):
                wavelength = c / freq
                phase = -2j * np.pi * (u * l + v * m + w * (n - 1)) / wavelength
                vis[row, col] += flux * np.exp(phase) / n
    return vis

mreineck commented 2 months ago

Sorry for the long delay, I was travelling ... I would definitely be happy to fix any deviations from common conventions; we just need to be really careful how we do this, because the code seems to work exactly as expected within wsclean, Codex Africanus etc. If we fix something within the wgridder code, we probably have to adjust some (probably inadvertent?) workarounds in the interface that these codes have to wgridder.

It would be great to have a quick call about this. Unfortunately I'm currently stuck in Hamburg due to flooding in southern Germany and will probably only be back in my office from Wednesday...

mreineck commented 2 months ago

I'm still unsure why the problem is not detected by ducc's test, since we test faceted gridding and therefore call the wgridder with sometimes significant center_x and center_y.

mreineck commented 2 months ago

@aroffringa do you perhaps have some idea what's going on here?

o-smirnov commented 2 months ago

I share @mreineck's worries. This clearly has worked in other contexts, so what gives -- is there a counter-bug negating the bug?

mreineck commented 2 months ago

My suspicion is that it is in fact a counter-bug. Typically, when interfacing with external code and things not giving the right results immediately, people play around with signs and simple things until there is agreement. I at least plead guilty to doing this occasionally ...

aroffringa commented 2 months ago

I'm pretty sure the output of wgridder is correct and matches other gridders. WSClean doesn't flip signs when passing data from or to the wgridder; they come straight from the measurement set. So if there's a sign bug, the counter-acting sign flip must also be behind the wgridder's interface, so should be unnoticable, also in unit tests (as long as they use the same interface)...

mreineck commented 2 months ago

OK, here's perhaps some small hint: in https://gitlab.com/aroffringa/wsclean/-/blob/master/wgridder/wgriddinggridder_simple.cpp?ref_type=heads#L117 we pass the shifts to wgridder with negtive signs.

But even if I adjust this in the test script above, it doesn't seem to fix things...

Joshuaalbert commented 2 months ago

For those able to make a call, I would propose a few resolutions to discuss:

Jointly walkthrough the above unittest and ensure it's 100% correct.
Supposing it's correct, decide on a course of action regarding correcting w-gridder and informing interfacing package owners.

Joshuaalbert commented 2 months ago

@mreineck (optionally also @landmanbester @o-smirnov @aroffringa) how would tomorrow work for a meeting at 16:00 CET/SAST?

mreineck commented 2 months ago

Yes, that would work for me!

Joshuaalbert commented 2 months ago

W-gridder call Jun 6 • 10:00–11:00 EST (16:00 - 17:00 CET) • View details & RSVP https://calendar.app.google/5g5NrDsGfxdJxSxw9

landmanbester commented 2 months ago

I would be more than happy to join but the above link does not work for me. Can you resend please?

aroffringa commented 2 months ago

I'm at the LOFAR family meeting in Leiden and won't be able to join. @mreineck good point about the inverted l (ell) and m shifts; I guess that does indicate that the sign of w is wrong. You would only see a change of flipping l/m shifts if there is a shift, i.e. if either -shift is applied to wsclean, or in faceting mode, so if you just flip those back in a 'normal' imaging run nothing happens (because they are zero).

One thing about the test above: it let's both ell and m increase with x and y in pixel_to_lmn. In WSClean, ell increases in the RA direction, meaning that x depends on negative ell. This is however not the case for m so might only create more inconsistency :). Also the direction of y is normally "up" in fits files, but often not like that in memory. If both l and m were the wrong way around, then the l and m shifts also have to be inverted. So maybe there's an explanation there?

BTW these conversions can be found here: https://gitlab.com/aroffringa/aocommon/-/blob/master/include/aocommon/imagecoordinates.h?ref_type=heads

Joshuaalbert commented 2 months ago

@landmanbester try this and I'll admit you.

W-gridder call Thursday, Jun 6 • 16:00–17:00 (GMT+2) Google Meet joining info Video call link: https://meet.google.com/spr-kofc-rnq

landmanbester commented 2 months ago

Thanks, will join in 10

Joshuaalbert commented 2 months ago

We met and came to the following resolutions:

We agreed that w-gridder is implemented with a sign-flipped w coordinate.
We decided that @mreineck will fix this in a staged non-disruptive approach: a. The functions dirty2vis and vis2dirty (and older dirty2ms and ms2dirty) will remain unchanged. b. A new pair of adjoint operators will be released with changes listed below. c. Users of w-gridder should update their code to these new adjoint operators at their earlier convenience.

Changes planned for new adjoint operators

In addition to discussing the flipped w coordinate we also reviewed other anecdotal observations about the implementation of w-gridder that could be improved. In particular, we noted: i. sometimes various negations and/or transpositions of inputs are necessary, potentially implying counter-bugs are introduced to integrate w-gridder. ii. flip_v=True is sometimes necessary, without understanding why. iii. issues predicting/imaging near edges of fields, e.g. MeerKAT, particularly noticeable in all-sky images, e.g. LWA.

To addresses points we planned:

Fix the negated w issue.
Change argument names to reflect l (ell) and m, e.g. center_x -> center_l.
Improve documentation, by making it more explicit: a. in C-order memory (numpy) rows of dirty correspond to increasing l, and columns to increasing m, noting that FITS files are F-order. b. explicitly state the equation for l/m of each pixel, e.g. in zero-indexing l[row] = (- 0.5 * npix_l + row) * pixsize_l, similarly for m.
Remove flip_v as it shall no longer be needed.

In addition, a good idea would be to provide a self-contained jupyter notebook showing how to read an image from a FITS file, and correctly predict visibilities, and to then compute the dirty image and correctly save to FITS format, handling all necessary transpositions, etc.

All input and comments can be made under this issue.

landmanbester commented 2 months ago

Digging in a little, I realised that it's just a flipped coordinate convention. Noting that a change of sign in (l,m) leaves the sign of n invariant, we can see that flipping the sign of (l,m) with a negated w-coordinate simply amounts to using a different sign convention for the phase term ('casa' vs 'phys' in the test below). The modified test below will pass with w unnegated. I had to reverse the coordinate order of the explicit degridder and negate the sign of (l0,m0) going into the wgridder instead (note I also changed the offset pixel to be an integer as N/3 != N//3).

import itertools

import numpy as np
import pytest
from ducc0.wgridder.experimental import dirty2vis

@pytest.mark.parametrize("center_offset", [0.0, 0.1, 0.2])
@pytest.mark.parametrize("negate_w", [False, True])
def test_wrong_w(center_offset: float, negate_w: bool):
    np.random.seed(42)
    N = 1024
    num_ants = 100
    num_freqs = 2

    pixsize = 0.5 * np.pi / 180 / 3600.  # 0.5 arcsec ~ 4 pixels / beam, so we'll avoid aliasing
    l0 = center_offset
    m0 = center_offset
    dl = pixsize
    dm = pixsize
    dirty = np.zeros((N, N))

    dirty[N // 2, N // 2] = 1.
    dirty[N // 4, N // 4] = 1.

    def pixel_to_lmn(xi, yi):
        l = l0 + (-N / 2 + xi) * (-dl)
        m = m0 + (-N / 2 + yi) * (-dm)
        n = np.sqrt(1. - l ** 2 - m ** 2)
        return np.asarray([l, m, n])

    lmn1 = pixel_to_lmn(N // 2, N // 2)
    lmn2 = pixel_to_lmn(N // 4, N // 4)

    antenna_1, antenna_2 = np.asarray(list(itertools.combinations(range(num_ants), 2))).T
    antennas = 10e3 * np.random.normal(size=(num_ants, 3))
    antennas[:, 2] *= 0.001
    uvw = antennas[antenna_1] - antennas[antenna_2]

    freqs = np.linspace(700e6, 2000e6, num_freqs)
    vis = dirty2vis(
        uvw=uvw,
        freq=freqs,
        dirty=dirty,
        wgt=None,
        pixsize_x=dl,
        pixsize_y=dm,
        center_x=-l0,
        center_y=-m0,
        epsilon=1e-6,
        do_wgridding=True,
        flip_v=False,
        divide_by_n=True,
        nthreads=1,
        verbosity=0,
    )

    lmn = [lmn1, lmn2]
    pixel_fluxes = [1., 1.]
    vis_explicit = explicit_degridder(uvw, freqs, lmn, pixel_fluxes, negate_w, convention='casa')

    np.testing.assert_allclose(vis.real, vis_explicit.real, atol=1e-4)
    np.testing.assert_allclose(vis.imag, vis_explicit.imag, atol=1e-4)

def explicit_degridder(uvw, freqs, lmn, pixel_fluxes, negate_w, convention='phys'):
    vis = np.zeros((len(uvw), len(freqs)), dtype=np.complex128)
    c = 299792458.  # m/s

    for row, (u, v, w) in enumerate(uvw):
        if negate_w:
            w = -w
        for col, freq in enumerate(freqs):
            for flux, (l, m, n) in zip(pixel_fluxes, lmn):
                wavelength = c / freq
                sign = -1 if convention=='phys' else 1
                phase = sign * 2j * np.pi * (u * l + v * m + w * (n - 1)) / wavelength
                vis[row, col] += flux * np.exp(phase) / n
    return vis

This can also be seen in the form of the explicit_gridder which sets up its (l,m) coordinates with C-ordered matrix based indexing (indexing='ij' in np.meshgrid) but then uses a negated w coordinate. This is a bit non-standard but I guess so is the fits convention. I battled to get the conventions aligned initially but I believe the load_fits and save_fits functions in pfb-imaging render the array in a consistent form for use with the wgridder (see here). I wouldn't say this is the correct way to read in fits files but at least arrays read in this way will display the same using plt.imshow as it would in a fits viewer and seems to result in the expected visibilities when passed into the wgridder.

I still think it would be useful to have an implementation where w has the same sign as u and v and without the need to negate center_x and center_y. My preference would be for the coordinates on the grid to be set up with C-ordered matrix based indexing but as long as this is documented somewhere that doesn't really matter. Hoping the above helps to get us closer to that

Joshuaalbert commented 2 months ago

I'm not sure still. It still looks to me that wgridder, without any negation of image centres is equivalent to physical convention + w-negation. I do agree that flipping l and m + negating w is the same as negating each of u,v,w, but if you leave the DFT unmodified (don't flip dl,dm) then it still looks like wgridder is not doing what is expected.

import itertools

import numpy as np
import pytest
from ducc0.wgridder import dirty2vis

@pytest.mark.parametrize("center_offset", [0.0, 0.1, 0.2])
@pytest.mark.parametrize("negate_w", ['neg_w', 'pos_w'])
@pytest.mark.parametrize("convention", ['casa', 'fourier'])
@pytest.mark.parametrize("negate_wgridder_center_xy", ['neg_center', 'normal_center'])
def test_wrong_w(center_offset: float, negate_w: str, convention: str, negate_wgridder_center_xy: str):
    np.random.seed(42)
    N = 512
    num_ants = 100
    num_freqs = 1

    pixsize = 0.5 * np.pi / 180 / 3600.  # 1 arcsec ~ 4 pixels / beam, so we'll avoid aliasing
    l0 = center_offset
    m0 = center_offset
    dl = pixsize
    dm = pixsize
    dirty = np.zeros((N, N))

    dirty[N // 2, N // 2] = 1.
    dirty[N // 4, N // 4] = 1.

    def pixel_to_lmn(xi, yi):
        l = l0 + (-N / 2 + xi) * dl
        m = m0 + (-N / 2 + yi) * dm
        n = np.sqrt(1. - l ** 2 - m ** 2)
        return np.asarray([l, m, n])

    lmn1 = pixel_to_lmn(N // 2, N // 2)
    lmn2 = pixel_to_lmn(N // 4, N // 4)

    antenna_1, antenna_2 = np.asarray(list(itertools.combinations(range(num_ants), 2))).T
    antennas = 10e3 * np.random.normal(size=(num_ants, 3))
    antennas[:, 2] *= 0.001
    uvw = antennas[antenna_2] - antennas[antenna_1]

    freqs = np.linspace(700e6, 2000e6, num_freqs)

    vis = dirty2vis(
        uvw=uvw,
        freq=freqs,
        dirty=dirty,
        wgt=None,
        pixsize_x=dl,
        pixsize_y=dm,
        center_x=-l0 if negate_wgridder_center_xy == 'neg_center' else l0,
        center_y=-m0 if negate_wgridder_center_xy == 'neg_center' else m0,
        epsilon=1e-4,
        do_wgridding=True,
        flip_v=False,
        divide_by_n=True,
        nthreads=1,
        verbosity=0,
    )

    lmn = [lmn1, lmn2]
    pixel_fluxes = [1., 1.]
    vis_explicit = explicit_degridder(uvw, freqs, lmn, pixel_fluxes, negate_w, convention)

    np.testing.assert_allclose(vis.real, vis_explicit.real, atol=1e-4)
    np.testing.assert_allclose(vis.imag, vis_explicit.imag, atol=1e-4)

def explicit_degridder(uvw, freqs, lmn, pixel_fluxes, negate_w, convention):
    vis = np.zeros((len(uvw), len(freqs)), dtype=np.complex128)
    c = 299792458.  # m/s

    if convention == 'casa':
        uvw = -uvw

    for row, (u, v, w) in enumerate(uvw):
        if negate_w == 'neg_w':
            w = -w
        for col, freq in enumerate(freqs):
            for flux, (l, m, n) in zip(pixel_fluxes, lmn):
                wavelength = c / freq
                phase = -2j * np.pi * (u * l + v * m + w * (n - 1)) / wavelength
                vis[row, col] += flux * np.exp(phase) / n
    return vis

Screenshot from 2024-06-11 10-23-42

mreineck commented 2 months ago

There is now a new branch called tweak_wgridder_conventions, which has the functions dirty2vis_tidy and vis2dirty_tidy in ducc0.wgridder.experimental. These flip the w coordinate with respect to the "untidy" versions, and using those, the original test script in this issue passes for negate_w=True.

The latest test script succeeds for the new function with negate_w="pos_w", convention="fourier", and negate_wgridder_center_xy="normal_center".

Please let me know what else should be done!

aroffringa commented 2 months ago

I'm not sure it is wise to change the convention of wgridder's interface (maybe apart from the l,m shifts, but not the l,m themselves). The current w sign matches with wsclean's convention. If wgridder interface is thus changed, they no longer match. As I mentioned, it seems just to be l,m direction conventions and that seems now also Joshua's conclusion (?) but I think the way WSClean treats l,m is standard and directly matches with how the image is written into fits files (so get uvws from MS, run wgridder, write image as fits file is possible without any change of signs). I think it would be better to accurately document the current convention (as well as indicate near the relevant formulas that the flip of l,m are to match with the normal directions of l,m in memory/fits files) than to actually change it so it no longer matches with fits files.

mreineck commented 2 months ago

I don't intend to change the semantics of what currently exists. But if there is enough agreement that the current implementation follows bad/confusing conventions, I can fairly easily add more functions which present a saner interface.

Documentation-wise I think it will ultimately boil down to providing a short Python DFT code which does exactly (up to accuracy limits) the same as the C++ code; this reduces the potential for misunderstandings and ambiguity to a minimum.

landmanbester commented 2 months ago

so get uvws from MS, run wgridder, write image as fits file is possible without any change of signs

Do you not still need to flip the v coordinate inside the wgridder?

providing a short Python DFT code which does exactly (up to accuracy limits) the same as the C++ code

That would be great, thanks. I think this essentially boils down to getting the tests passing with a version of the explicit_gridder that uses the same sign for all three of uvw and doesn't require negating center_x or center_y. @Joshuaalbert do you agree with this?

Joshuaalbert commented 2 months ago

@aroffringa I understand the concern however for posterity in the field of radio interferometry we should not knowingly bake in different conventions into key software. As the proposal of @mreineck stands you'd have the current interface to use until/if you had the time to update wsclean. Also as far as l,m calculations go they should be always consistent with l pointing along u and m along v. Anything else is confusing. Memory reordering for efficient access can be a simple copy command.

@landmanbester yes essentially the test against DFT with same inputs. Dirty image has l increasing over row and m increasing over column.

aroffringa commented 2 months ago

Well before we make claims about the one true convention, is the claim now that:

the "proper" convention has positive RA in the direction of negative l and negative pixel (/memory) position; or
all dimensions are increasing in the same direction, including RA? ...because I think one of those must be true to take the test above as the ground truth. The first assumption breaks with the normal formula l = cos decl sin dRA. Both cases would cause any save or display function to have to mirror the image. I'm not really sure that, as far as conventions can be compared, either is a better convention.

The WSClean convention is: positive in memory = positive in x = negative in l = negative in RA. This 'x' value is also consistent with what kvis and ds9 show, the one given in the test is not, because then the image gets mirrored before display to get RA go from right to left.

mreineck commented 2 months ago

OK, this is the unit test I'm suggesting ... a somewhat generalized version of the initial script. It also has a slightly adjusted tolerance check, which should be more in line with the accuracy the wgridder is expected to deliver.

import itertools
import numpy as np
import pytest
from ducc0.wgridder.experimental import dirty2vis_tidy
from ducc0.misc import l2error

@pytest.mark.parametrize("center_offset", [(0.0, 0.0), (0.1, -0.17), (0.2, 0.5)])
def test_wrong_w(center_offset: float):
    np.random.seed(42)
    Nl, Nm = 50, 70
    num_ants = 10
    num_freqs = 1

    pixsize = 0.5 * np.pi / 180 / 3600.  # 1 arcsec ~ 4 pixels / beam, so we'll avoid aliasing
    l0, m0 = center_offset
    dl = pixsize
    dm = pixsize*0.9

    dirty = np.random.normal(size=(Nl, Nm))

    antenna_1, antenna_2 = np.asarray(list(itertools.combinations(range(num_ants), 2))).T
    antennas = 10e3 * np.random.normal(size=(num_ants, 3))
    antennas[:, 2] *= 0.001
    uvw = antennas[antenna_2] - antennas[antenna_1]

    freqs = np.linspace(700e6, 2000e6, num_freqs)

    epsilon = 1e-8

    vis = dirty2vis_tidy(
        uvw=uvw,
        freq=freqs,
        dirty=dirty,
        wgt=None,
        pixsize_x=dl,
        pixsize_y=dm,
        center_x=l0,
        center_y=m0,
        epsilon=epsilon,
        do_wgridding=True,
        flip_v=False,
        divide_by_n=True,
        nthreads=1,
        verbosity=0,
    )

    vis_explicit = explicit_degridder(uvw, freqs, dirty, l0, dl, m0, dm)

    np.testing.assert_allclose(l2error(vis, vis_explicit), 0., atol=epsilon)

def explicit_degridder(uvw, freqs, dirty, l0, dl, m0, dm):
    vis = np.zeros((len(uvw), len(freqs)), dtype=np.complex128)
    c = 299792458.  # m/s

    Nl, Nm = dirty.shape
    l = (l0 + (-Nl / 2 + np.arange(Nl)) * dl).reshape((Nl,1))
    m = (m0 + (-Nm / 2 + np.arange(Nm)) * dm).reshape((1,Nm))
    n = np.sqrt(1. - l**2 - m**2)
    nm1 = n-1

    for row, (u, v, w) in enumerate(uvw):
        for col, freq in enumerate(freqs):
            wavelength = c / freq
            phase = -2j * np.pi * (u * l + v * m + w * nm1) / wavelength
            vis[row, col] = np.sum(dirty * np.exp(phase) / n)
    return vis

mreineck commented 2 months ago

It's of course possible to test over a larger parameter space, rather than fix most parameters inside test_wrong_w.

Joshuaalbert commented 2 months ago

Well before we make claims about the one true convention, is the claim now that:

the "proper" convention has positive RA in the direction of negative l and negative pixel (/memory) position; or

all dimensions are increasing in the same direction, including RA?

I think ell and RA are always aligned per definition of lmn as Fourier conjugate to uvw and u points in RA_hat. However, FITS defines the detector space with pixels pointing right/west and up/north. Thus as an observer facing south looking into the sky. The transform from detector space to world coordinates involves a negation of the horizontal coordinates which gives the negative RA per "x" pixel.

...because I think one of those must be true to take the test above as the ground truth. The first assumption breaks with the normal formula l = cos decl sin dRA. Both cases would cause any save or display function to have to mirror the image.

It is true that one would need to mirror the image to save/load the dirty image to/from FITS. However, I think the concepts behind the FITS conventions (detector space vs world coordinates) are different from those here (ell, m increasing with memory).

I'm not really sure that, as far as conventions can be compared, either is a better convention.

The WSClean convention is: positive in memory = positive in x = negative in l = negative in RA. This 'x' value is also consistent with what kvis and ds9 show, the one given in the test is not, because then the image gets mirrored before display to get RA go from right to left.

It seems like w-gridder was written with the intent of ell increasing with memory. This is evident in the source code and preexisting unittests. Perhaps it was a happy accident that it worked with FITS flipped x axis. Also doesn't wsclean also require flipping v as per @landmanbester 's question?

Joshuaalbert commented 2 months ago

LGTM @mreineck. One question though, does the tolerance have dependence on precision? Might be worth running with both complex64 and complex128 to verify the promise holds.

aroffringa commented 2 months ago

I think ell and RA are always aligned per definition of lmn as Fourier conjugate to uvw and u points in RA_hat. However, FITS defines the detector space with pixels pointing right/west and up/north. Thus as an observer facing south looking into the sky. The transform from detector space to world coordinates involves a negation of the horizontal coordinates which gives the negative RA per "x" pixel.

Agree -- but of course, it's not just a fits convention to have RA decreasing in an image, it's any astronomy image that does this.

It is true that one would need to mirror the image to save/load the dirty image to/from FITS. However, I think the concepts behind the FITS conventions (detector space vs world coordinates) are different from those here (ell, m increasing with memory).

Why? I think it is more convenient to not have to always flip the image. WSClean uses this convention internally also (for deconvolution and for the other gridders), so if the interface were to change, I would implement this in WSClean by flipping l, m before calling wgridder. While that's okay -- it's obviously easy to do and probably the computational cost is negligable, I still find it weird that you'd move in that direction.

It seems like w-gridder was written with the intent of ell increasing with memory. This is evident in the source code and preexisting unittests. Perhaps it was a happy accident that it worked with FITS flipped x axis.

This is true, but of course it wasn't noticed by me because wgridder uses by accident a very reasonable convention ;).

Also doesn't wsclean also require flipping v as per @landmanbester 's question?

No, I think for a similar reason: images in memory/disk are stored "increasing down", whereas m/dec increases upwards.

Joshuaalbert commented 2 months ago

@aroffringa it looks like negate_v is being used by WSClean (see below), which would be necessary because if you pass in negated l0 and m0 and FITS only has negated ell axis, then you still need to negate m/v, which in combination with negated w would end up with "casa" convention.

Let me clarify my stance, taking into account Andre's points on detector space convention.

wgridder was implemented with an incorrect RIME, regardless of how you order the dirty image. This must be rectified, and should use physical convention by default.
There is a decision that needs to be made about the ordering of the dirty image. I agree with Andre that keeping interfaces consistent between software is valuable, however I propose this can be up to the user since not all algorithms adopt detector space ordering. In any case the inputs should be in world coordinates, i.e. center_l not center_x, pixelsize_l not pixelsize_x.

@mreineck The call signature I propose taking into account these points is thus:

def new_dirty2vis(
    uvw: np.ndarray,
    freqs: np.ndarray,
    dirty: np.ndarray,
    wgt: np.ndarray,
    mask: np.ndarray,
    pixsize_m: float,
    pixsize_l: float,
    center_m: float,
    center_l: float,
    epsilon: float = 1e-4,
    do_wgridding: bool = True,
    divide_by_n: bool = True,
    sigma_min: float = 1.1,
    sigma_max: float = 2.6,
    nthreads: int = 1,
    verbosity: int = 0,
    convention: Literal['physical', 'casa'] = 'physical',
    dirty_ordering: Literal['detector', 'world'] = 'detector'
):
"""
Compute an approximation to the DFT RIME using the w-gridding algorithm.
When `convention='physical'`, `do_wgridding=True` and `divide_by_n=True` this is equivalent to,

V(uvw, freq) = sum_{row, col} (I(l, m) / n) * exp(-2pi i (freq/c) (u l + v m + w (n - 1)))

where n = sqrt(1 - l^2 - m^2) and c is the speed of light. If `convention='casa'`, then it is equivalent to
the same but with `+2pi` in the exponential (accomplished by first negating `uvw` and using the same algorithm).

The calculation of `l` and `m` of each element `dirty[row, col]` is, if `dirty_ordering='detector'`:

l[row] = center_l - (-0.5 * num_l + row) * pixsize_l
m[col] = center_m + (-0.5 * num_m + col) * pixsize_m

and if `dirty_ordering='world'`:

l[row] = center_l + (-0.5 * num_l + row) * pixsize_l
m[col] = center_m + (-0.5 * num_m + col) * pixsize_m

Args:
    uvw: [num_rows, 3] array of uvw coordinates in m.
    freqs: [num_freqs] array of frequencies in Hz.
    dirty: [num_l, num_m] array of flux, in units of JY/PIXEL, with ordering specified by `dirty_ordering`.
        num_l and num_m must be even.
    wgt: [num_rows, num_freqs] array of weights, multiplied with output visibilities.
    mask: [num_rows, num_freqs] array of mask, only predict where mask!=0.
    pixsize_m: scalar, change in `m` of each column, i.e. for *all* col |m[col+1] - m[col]|, must be > 0.
    pixsize_l: scalar, change in `l` of each row, i.e. for *all* row |l[row+1] - l[row]|, must be > 0.
    center_m: scalar, the value of l[row=0.5*num_l] (see definition of l[row] above).
    center_l: scalar, the value of m[col=0.5*num_m] (see definition of m[col] above).
    epsilon: scalar, gridding accuracy parameter, representing maximum absolute error of approximation.
    do_wgridding: scalar, whether to do w-gridding.
    divide_by_n: whether to divide by n.
    sigma_min: scalar, minimum sigma for gridding.
    sigma_max: scalar, maximum sigma for gridding.
    nthreads: number of threads to use.
    verbosity: verbosity level, 0, 1.
    convention: 'physical' or 'casa', the RIME convention.
    dirty_ordering: 'detector' or 'world', the ordering of the dirty image. 'detector' ordering has `l` decreasing
        with row and `m` increasing with column. 'world' ordering has `l` increasing with row and `m` increasing
        with column. Note, `detector` is the ordering used by FITS images and is thus default.

Returns:
    [num_rows, num_freqs] array of visibilities.
"""

Wsclean and `negate_v`

While it currently is necessary to do this flip in order to use detector ordering, I proposed something that won't require this.

Wsclean wgridder call

dirty2ms<NumT, NumT>(uvw2, freq2, tdirty, twgt, tmask, pixelSizeX_,
                         pixelSizeY_, epsilon_, true, nthreads_, ms, verbosity_,
                         true, false, sigma_min, sigma_max, -l_shift_,
                         -m_shift_);

dirty2ms signature

void dirty2ms(const cmav<double,2> &uvw,
  const cmav<double,1> &freq, const cmav<Timg,2> &dirty,
  const cmav<Tms,2> &wgt_, const cmav<uint8_t,2> &mask_, double pixsize_x, double pixsize_y,
  double epsilon, bool do_wgridding, size_t nthreads, const vmav<complex<Tms>,2> &ms,
  size_t verbosity, bool negate_v=false, bool divide_by_n=true,
  double sigma_min=1.1, double sigma_max=2.6, double center_x=0, double center_y=0, bool allow_nshift=true)

Joshuaalbert commented 2 months ago

Another issue that is tangential to this issue but worth mentioning because it relates to calculation of l,m as a function of row/col is that FITS images in SIN--* projection is only an approximation to the plane of sky coordinates, and is only valid near the reference pixel, the origin of the tangent to the celestial sphere. I.e. constant changes in pixel are not constant changes in l,m. Eric Griesen mentions this here, and ~~I show this here~~: https://gist.github.com/Joshuaalbert/b434a5d416b4fa8db4a8e68e975113c5.

Edit: I misinterpreted Eric and tried to prove it myself, and thought I succeeded, but had made a mistake in my math. I updated the math to be correct, in which case we indeed recover x=l*180/pi and y=m*180/pi.

aroffringa commented 2 months ago

I believe the comment by Eric refers to the old-school method of gridding without w-terms, as was usual. That's only valid in the small-FOV case. Since w-values are taken into account, this is not a concern, and delta l,m is constant. I don't follow the equations in your code, but in full-sky images, the normal l,m calculations (like e.g. the ones defined here: https://adsabs.harvard.edu/full/1999ASPC..180..383P ) still work fine in wsclean output images.

You're right about the negative_v - sorry I was incorrect, wasn't aware of that parameter. You could of course change the parameter to "bool negative_u" (instead of v) and fix the equations; in that case users can chose their u-convention and for those that use 'true' nothing changes :) (well, except the l,m shifts, for the better). That wouldn't require passing negated ls or ms.

That also avoids the terms 'casa', 'physical', 'detector' and 'world', which I find are confusing (also not sure why two parameters for the convention are needed, I hope no one thinks of combining casa + detector.).

(remove paragraph about pixel offset -- I misread)

PS: I noted a typo in those docs: center_m is center_l.

Joshuaalbert commented 2 months ago

I believe the comment by Eric refers to the old-school method of gridding without w-terms, as was usual. That's only valid in the small-FOV case. Since w-values are taken into account, this is not a concern, and delta l,m is constant. I don't follow the equations in your code, but in full-sky images, the normal l,m calculations (like e.g. the ones defined here: https://adsabs.harvard.edu/full/1999ASPC..180..383P ) still work fine in wsclean output images.

Ah, thanks for pointing that out about Eric's comment. After reading his comment in the paper I set out to derive it myself and thought I had proved there was some small dependence of pixel on delta l,m, but I had made a mistake in my trig math. delta l,m is constant per pixel in SIN. Never do trig on a whim.

You're right about the negative_v - sorry I was incorrect, wasn't aware of that parameter. You could of course change the parameter to "bool negative_u" (instead of v) and fix the equations; in that case users can chose their u-convention and for those that use 'true' nothing changes :) (well, except the l,m shifts, for the better). That wouldn't require passing negated ls or ms.

That also avoids the terms 'casa', 'physical', 'detector' and 'world', which I find are confusing (also not sure why two parameters for the convention are needed, I hope no one thinks of combining casa + detector.).

(remove paragraph about pixel offset -- I misread)

PS: I noted a typo in those docs: center_m is center_l.

Good, it seems there are no major objections then, and we shall move forward with the proposal above.

mreineck commented 2 months ago

OK, I'll start with implementing the proposed interface this week. From what I understand so far, casa is equivalent to "add a (possibly additional) sign flip to u,v,w", correct?

Concerning detector vs. world, just a piece of information: ducc can deal with arrays that have arbitrary memory layout, so you can call it from Python with, say, dirty[-1::-1, :] if you want axes reversed. Similarly when calling from C++, you can build the mav array descriptors to have negative strides. Given this, do you still consider the introduction of an additional parameter more convenient? I'm fine with doing it, but I wanted to point this out first.

Joshuaalbert commented 2 months ago

@mreineck yes casa is equivalent to negating uvw and using the physical convention. Negating would be performed by wgridder so that uvws fed in are always antenna2 - antenna1.

For memory layout, I know it's trivial to switch strides before feeding it in. The introduction of the extra parameter would leave it up to the user. But so would including clear documentation about the expected ordering of memory. I kind of prefer the extra parameter because it will lead to cleaner code on the user's side which I think has merit.

landmanbester commented 2 months ago

From what I understand so far, casa is equivalent to "add a (possibly additional) sign flip to u,v,w", correct?

Yup. I think a more transparent naming convention would be preferable to casa. Something like phase_convention which can be set to positive or negative (maybe even '+/-')?

Given this, do you still consider the introduction of an additional parameter more convenient?

Only if there are performance gains in choosing one over the other. If you do add this parameter then, as with casa above, I don't think detector or world means the same in every context and might cause confusion. I would rather consider a more generic parameter that specifies which corner of the image the coordinates start increasing from. This would have to be documented carefully though as it can cause some confusion when viewing the image as fits

Joshuaalbert commented 2 months ago

Oh another good reason for wgridder to do the striding internally would be because you could improve performance by ensuring the memory is ideally laid out for maximal performance rather than relying on the user.

@landmanbester is right that better names could be chosen. As long as documentation is clear on how l,m are assigned to each row/col of the dirty it should be good enough. One of us can make a self contained example showing how to load data from fits a d save to fits with the new interface so that users have a guide.

mreineck commented 2 months ago

Oh another good reason for wgridder to do the striding internally would be because you could improve performance by ensuring the memory is ideally laid out for maximal performance rather than relying on the user.

That would typically mean making a copy with optimized strides, and I won't do that, since I try to keep memory footprint as small as possible. Also, the algorithm doesn't have any performance-critical interactions with the dirty image array, so that's not a real worry.

So I can add the extra parameter to the new interface, but internally I'll handle this via inverted strides, since it requires fewer modifications to the core algorithm.

mreineck commented 2 months ago

The idea with phase_convention=+/- seems appealing at first glance, but it becomes a nightmare when looking at the adjoint direction. Does "+" there mean a minus in the exponent, or are the calls only adjoint if they are called with different phase conventions? I feel we need something different here.

landmanbester commented 2 months ago

The idea with phase_convention=+/- seems appealing at first glance, but it becomes a nightmare when looking at the adjoint direction. Does "+" there mean a minus in the exponent, or are the calls only adjoint if they are called with different phase conventions? I feel we need something different here.

I had a thought along similar lines as I was suggesting it. It is similar to the forward/backward parameters in your fft module but exposing it makes for another parameter users can trip over. In that sense, specifying the origin of the image might make more sense. I'm not sure what else to suggest, sorry. I love the simplicity of the current interface and, given that it doesn't do anything wrong, just documenting exactly what the assumptions/conventions are would be sufficient for my purposes

mreineck commented 2 months ago

OK, here is what I'll do in any case for the moment: I'll add flip_u and flip_w flags to the core code, since this is something I cannot easily emulate in the interface wrappers; all the rest can be easily custom-tailored on a higher level.

Joshuaalbert commented 2 months ago

@landmanbester @mreineck I suggest we go with my proposed interface. Two main things:

the convention should be simply: 'casa' or 'physical' because it's clear, and it's a common terminology. I don't support +/- terminology. @mreineck adding flip_u and flip_w is a nightmare and adds confusion unnecessarily. Indeed wgridder wasn't 'wrong' but it wasn't clear enough, and now is our opportunity to improve it once and for all.
the equation for l,m per row/col should be explicit. There are two choices: well-document the current expectation, or better (in my opinion) is let user choose from the two seen scenarios. This is simple to do, and follows the principle of design to simplify integration.

The proposed interface here captures these two things. @landmanbester regarding your comment on detector vs world, I think this is where documentation will resolve any possible confusion.

mreineck / ducc