sot / astromon

Astrometric accuracy (celestial location) monitor
0 stars 0 forks source link

Cross matches #17

Closed javierggt closed 2 years ago

javierggt commented 2 years ago

This PR is a refactor of the cross-match methods, partly to address requests in issue #15 and partly to fix a few things.

The things that changed upon @taldcroft request:

Other cleanup:

Testing

taldcroft commented 2 years ago

Not quite in this PR, but this code got my attention. It is making a view of a subset of a Table and then sorting within that view. Apparently that works, but it still strikes me as possibly sketchy. It is also a lot slower than it needs to be because of making lots of tables and sorting columns that don't need attention.

    matches = matches.group_by('obsid')
    for g in matches.groups:
        g.sort(['cat_order', 'dr'])
    result = table.vstack([g[0] for g in matches.groups])

This should be orders of magnitude faster and equivalent [EDIT: confirmed]:

    mg = matches.group_by('obsid')
    indices = mg.groups.indices
    idxs = []
    for i0, i1 in zip(indices[:-1], indices[1:]):
        idxs_sort = np.lexsort((mg['dr'][i0:i1], mg['cat_order'][i0:i1]))
        idxs.append(i0 + idxs_sort[0])
    result = mg[idxs]
javierggt commented 2 years ago

Not quite in this PR...

I could make an issue with this and follow-up after this PR. I will check.

taldcroft commented 2 years ago

Digging further on the point about the cross match code above, two things:

  1. It is actually broken and I confirmed that the sources returned don't adhere to the expected catalog priority ordering. For instance for obsid 1520 xid=6, there are matches for SDSS and 2MASS. Using catalogs=['ICRS', 'Tycho2', '2MASS', 'SDSS', 'USNO-B1.0'], should choose 2MASS. My original get_xcorr gives the 2MASS match, but the result from this branch is the SDSS one.

[EDIT] I think this was a different bug unrelated to the cross-match code in question.

  1. It is extremely slow in a apples-to-apples comparison with my original code. The line below is intended to do no filtering at all (to match my original get_xcorr(xray_src, cat_src, obs) function). The code below takes about 3 minutes to complete while the original get_xcorr finishes in 4 seconds.
    xcorr = cross_match.compute_cross_matches(
    catalogs=['ICRS', 'Tycho2', '2MASS', 'SDSS', 'USNO-B1.0'],
    dr=10000,
    snr=-10000,
    r_angle=10000,
    r_angle_grating=10000,
    near_neighbor_dist=-10000,
    )
taldcroft commented 2 years ago

I fixed the discrepancies between my original API code and this version with ff34529.

taldcroft commented 2 years ago

@javierggt - this PR is basically fine at this point except for the minor comments I made that are all easily fixed. If you just take care of those then we can merge this one.

javierggt commented 2 years ago

I made changes that I believe address @taldcroft's comments.

I also made a couple new commits:

taldcroft commented 2 years ago

I pushed a commit so the narrative docs include the information that future-me will need.

javierggt commented 2 years ago

I committed a fix for a failure I got, which I believe traces back to changing the return values of empty tables. I didn't spend much time checking why, but the bottomline is I add some extra columns to empty tables so the result's type is consistent with the case of non-empty tables, and downstream code can handle it.

I ran the tests above and check that the output is consistent with expectations.

javierggt commented 2 years ago

I fixed two typos and will squash in a moment.

taldcroft commented 2 years ago

:tada: