Try new candidate combinations for clustered guide stars

jeanconn commented 5 years ago

This PR takes steps to try different candidate star combinations to attempt to satisfy 3 checks that the guide stars are not too "clustered".

jeanconn commented 5 years ago

I've got a set of 400 for which the code works. "Fixed" was probably to strong for a statement for the resulting catalog; what I'll really need to do is run all the final optimized catalogs through aca_preview and see what else the optimization might break.

jeanconn commented 5 years ago

@taldcroft mentioned that the combination checking there might be value in trying the first "next" candidate as a possible replacement for more of the earlier stars sooner in the sort. My understandings is right now we've got

a)

0 1 2 3 4 0 1 2 3 5 0 1 2 3 6 0 1 2 3 7

And there could be value to

b)

0 1 2 3 4 0 1 2 3 5 0 1 2 4 5 0 1 3 4 5

I think I'm leaning a bit more toward the plain combinations, because I think we don't want to hurry to get to replacing the "best" star with a star that didn't even make the first cut? But a hard problem. I don't have a really strong metric to move to prefer, say catalog

1 2 3 4 5

over

0 1 2 3 16

If our stage "labels" were more meaningful, if would be easier to at least optimize a bit by that (starting with the candidates that all share the "next candidate" stage only).

jeanconn commented 5 years ago

Also worth brainstorming in this thread what debugging information is going to be most helpful to the ACA reviewer in the guide report for a tricky catalog that has already been "optimized" by this code. I was shopping just adding an attribute to the catalog to track if the "not first choice" catalog was selected from the available candidates due to this optimization, and added some log messages.

Maybe something like added text "(not selected due to overall catalog issues (clusters))" as well.

jeanconn commented 5 years ago

And it looks like 2 observations (16147 and 19969) since 2006:001 get through optimization/fixes here with a catalog that still doesn't satisfy all checks. 16147 has only 6 candidates to start and fixes don't help enough.

And it looks like 19969 is due a bug most likely in the starcheck database but will review it in a proseco context as well.

In [21]: cat = get_aca_catalog(19969)
Obsid 19969 didn't satisfy the cluster checks after optimization
In [22]: cat.guides.n_guide
Out[22]: 2

jeanconn commented 5 years ago

For this PR, I think this combination selection works, but could use a hand making sure that the cache clearing is reasonable (and style bits like having reasonable variable names and useful docs/comments).

Then it needs tests.

For assessing possible improvements to the order in which we go through combinations, @taldcroft were you thinking something like reviewing guide_count of the first match for each combination-selection-algorithm (to vet the algorithm, not as part of the algorithm?

taldcroft commented 5 years ago

I think we should prefer 1 2 3 4 5 over 0 1 2 3 15. The rationale is that for the issue of a bad dr95, a single bad star is typically the culprit. Since the "5th" best star is on average brighter and has fewer problems than the "16th" best, the 1 2 3 4 5 catalog is, statistically, lower risk.

Another way to think about it: if you have 16 candidate stars (all nicely distributed). So 0 1 2 3 4 is the starting catalog. But then you find that star 0 must be excluded for some reason, so what is your next choice for the catalog? It would be 1 2 3 4 5, not 1 2 3 4 15.

And here is the code which is efficient and basically sorts the combinations by max(index).

from itertools import combinations

def index_combinations(n, m):
    seen = set()
    for n_tmp in range(m + 1, n + 1):
        for comb in combinations(range(n_tmp), m):
            if comb not in seen:
                seen.add(comb)
                yield comb

for comb in index_combinations(6, 3):
    print(comb, '  ', max(comb))
(0, 1, 2)    2
(0, 1, 3)    3
(0, 2, 3)    3
(1, 2, 3)    3
(0, 1, 4)    4
(0, 2, 4)    4
(0, 3, 4)    4
(1, 2, 4)    4
(1, 3, 4)    4
(2, 3, 4)    4
(0, 1, 5)    5
(0, 2, 5)    5
(0, 3, 5)    5
(0, 4, 5)    5
(1, 2, 5)    5
(1, 3, 5)    5
(1, 4, 5)    5
(2, 3, 5)    5
(2, 4, 5)    5
(3, 4, 5)    5

for comb in combinations(range(6), 3):
    print(comb, '  ', max(comb))
(0, 1, 2)    2
(0, 1, 3)    3
(0, 1, 4)    4
(0, 1, 5)    5
(0, 2, 3)    3
(0, 2, 4)    4
(0, 2, 5)    5
(0, 3, 4)    4
(0, 3, 5)    5
(0, 4, 5)    5
(1, 2, 3)    3
(1, 2, 4)    4
(1, 2, 5)    5
(1, 3, 4)    4
(1, 3, 5)    5
(1, 4, 5)    5
(2, 3, 4)    4
(2, 3, 5)    5
(2, 4, 5)    5
(3, 4, 5)    5

taldcroft commented 5 years ago

I was shopping just adding an attribute to the catalog to track if the "not first choice" catalog was selected from the available candidates due to this optimization, and added some log messages.

I'd rather not add a specific attribute for this, instead just do this with logging messages. I'm not sure what you can do with that True/False attribute in the report besides emit some text saying that the spatial optimization was required. So instead just add that as a log message.

Remember that log messages can have arbitrary keys of your choosing, so you can add a message with a key required_optimization = True that the guide reporting code can respond to.

jeanconn commented 5 years ago

"I think we should prefer 1 2 3 4 5 over 0 1 2 3 15. The rationale is that for the issue of a bad dr95, a single bad star is typically the culprit." Good point! Thanks! As much part of me probably wants the best and the brightest when I look at it later in the ground solution, the single bad star is a more credible issue onboard.

I suppose my order of combinations would be a little safer if it used the stage information better (limiting the allowed candidates in chunks by stage), but your sort seems fine as well. Will validate.

taldcroft commented 5 years ago

... if it used the stage information better (limiting the allowed candidates in chunks by stage), but your sort seems fine as well

The ordering I proposed does this implicitly since the stage candidates table is already sorted by stage.

jeanconn commented 5 years ago

Yes, I recognized that.

Was commenting this more on it as a defect in my ordering and recognizing a possible middle-ground technique that could be applied to my ordering without going quite as quickly to a likelihood of swapping out the best star. But not really better in principle than what you suggest if as-you-say we just want to not have that one bad star that perturbs dr95.

jeanconn commented 5 years ago

I reran the validation at https://nbviewer.jupyter.org/url/cxc.harvard.edu/mta/ASPECT/proseco/validation-4.4/small_box_PR_validation.ipynb and I think this is good and reasonable. I was also working on some more functional tests, but those could go in a separate PR if we just want to get this merged.

taldcroft commented 5 years ago

:tada:

jeanconn commented 5 years ago

Thanks for the assist with the indexed_combinations on this!

sot / proseco

Try new candidate combinations for clustered guide stars #258