sot / kadi

Chandra commands and events
https://sot.github.io/kadi
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Support AGASC 1.8 in `get_starcats`, refactor get_agasc_cone_fast() #332

Open taldcroft opened 1 week ago

taldcroft commented 1 week ago

Description

This brings support for AGASC 1.8 to kadi get_starcats. Previously the code was hardwired to use proseco_agasc_1p7.h5.

This PR now uses either 1.7 or 1.8 to identify stars, depending on the observation date. The date cut-offs are encoded as configuration parameters that allow for uncertainty in the actual transition date:

if date < conf.date_start_agasc1p8_earliest:  # 2024-Jul-21
    Use 1.7
elif date < conf.date_start_agasc1p8_latest:  # 2024-Jul-21 + 30 days
    Try 1.8. If any stars are not ID'd then use 1.7.
else:
    Use 1.8

The method observations.get_agasc_cone_fast() was refactored to allow agasc.get_agasc_cone(..., cache=True) to do most of the work.

In addition some unrelated ruff updates were done.

Requires

Interface impacts

None.

Testing

Unit tests

kadi/commands/tests/test_commands.py ................................................................................. [ 44%] kadi/commands/tests/test_states.py .......................x......................... [ 71%] kadi/commands/tests/test_validate.py .................... [ 82%] kadi/tests/test_events.py .......... [ 87%] kadi/tests/test_occweb.py ...................... [100%]

====================================== 181 passed, 1 xfailed, 2 warnings in 139.49s (0:02:19) ======================================= (ska3) ➜ kadi git:(agasc-cone-fast-for-1p8) git rev-parse HEAD c89a189269666f0c6974a4c8bdbfcd843e3918bb

#### Without AGASC 1.8 available

(ska3) ➜ kadi git:(agasc-cone-fast-for-1p8) env PYTHONPATH=/Users/aldcroft/git/agasc pytest ======================================================== test session starts ======================================================== platform darwin -- Python 3.11.8, pytest-7.4.4, pluggy-1.4.0 rootdir: /Users/aldcroft/git configfile: pytest.ini plugins: timeout-2.2.0, anyio-4.3.0 collected 182 items

kadi/commands/tests/test_commands.py ..............................................s.................................. [ 44%] kadi/commands/tests/test_states.py .......................x......................... [ 71%] kadi/commands/tests/test_validate.py .................... [ 82%] kadi/tests/test_events.py .......... [ 87%] kadi/tests/test_occweb.py ...................... [100%]

================================= 180 passed, 1 skipped, 1 xfailed, 2 warnings in 139.03s (0:02:19) =================================


Independent check of unit tests by Jean
- [x] Linux

jeanconn-fido> export PYTHONPATH=/home/jeanconn/git/agasc jeanconn-fido> pytest ========================================================================== test session starts =========================================================================== platform linux -- Python 3.11.8, pytest-7.4.4, pluggy-1.4.0 rootdir: /proj/sot/ska/jeanproj/git configfile: pytest.ini plugins: anyio-4.3.0, timeout-2.2.0 collected 182 items

kadi/commands/tests/test_commands.py ..............................................s.................................. [ 44%] kadi/commands/tests/test_states.py .......................x......................... [ 71%] kadi/commands/tests/test_validate.py .................... [ 82%] kadi/tests/test_events.py .......... [ 87%] kadi/tests/test_occweb.py ...................... [100%]

========================================================= 180 passed, 1 skipped, 1 xfailed in 226.20s (0:03:46) ========================================================== jeanconn-fido> export AGASC_DIR=/proj/sot/ska/data/agasc/rc jeanconn-fido> pytest ========================================================================== test session starts =========================================================================== platform linux -- Python 3.11.8, pytest-7.4.4, pluggy-1.4.0 rootdir: /proj/sot/ska/jeanproj/git configfile: pytest.ini plugins: anyio-4.3.0, timeout-2.2.0 collected 182 items

kadi/commands/tests/test_commands.py ................................................................................. [ 44%] kadi/commands/tests/test_states.py .......................x......................... [ 71%] kadi/commands/tests/test_validate.py .................... [ 82%] kadi/tests/test_events.py .......... [ 87%] kadi/tests/test_occweb.py ...................... [100%]

=============================================================== 181 passed, 1 xfailed in 186.01s (0:03:06) =============================================================== jeanconn-fido> git rev-parse HEAD c89a189269666f0c6974a4c8bdbfcd843e3918bb jeanconn-fido> cd /home/jeanconn/git/agasc jeanconn-fido> git rev-parse HEAD aeeda59413c73c37d1030615cd7f451ad484a851



### Functional tests
<!-- Describe and document results of any functional tests, otherwise leave the text below -->
No functional testing.
jeanconn commented 17 hours ago

Regarding "Also, the time-range approach is not safe for past loads. If promotion happens after conf.date_start_agasc1p8_earliest, calls to kadi will be wrong if they are made after promotion (1p8 file is present), but refer to loads between the hardwired value and the actual value." - theoretically, the date for the kadi call is the date of the catalog, not the time the code is run, which is pretty safe. And overall this code is pretty safe in general if the behavior during the "between" window is that the code tries 1.8 and then tries 1.7 if something wasn't identified.

jeanconn commented 17 hours ago

Regarding "If promotion happens after conf.date_start_agasc1p8_earliest, calls to kadi will be wrong if they are made after promotion (1p8 file is present), but refer to loads between the hardwired value and the actual value." or maybe I'm just confused about what the issue is that you bring up - but theoretically, we'd just promote the AGASC 1.8 with this version of kadi and have the conf earliest date be a week later than that - and we'd just get a tiny hiccup if we had a replan.

javierggt commented 17 hours ago

the date for the kadi call is the date of the catalog, not the time the code is run, which is pretty safe.

Let's look at an example. Let's consider looking back at hypothetical loads:

By that time, the 1p8 file will be present, and the date is between date_start_agasc1p8_earliest and date_start_agasc1p8_latest, so versions = ["1p8", "1p7"] and then the 1p8 version will be returned here.

In this example, you would wrongly use AGASC 1p8.

The only way to get this right is to make sure date_start_agasc1p8_earliest matches the actual promotion date. In that case you might as well use a single date.

jeanconn commented 16 hours ago

Sure. So let's at least update the earliest date to be the 29th. And we could always update/refine conf again in the future as needed.

I don't have an opinion about if the code uses two dates or one.

javierggt commented 16 hours ago

My main point is that we do not need this time-range thing. A single date is enough. The time range introduces complexity and does not guarantee the right result.

If you do not mind having a short period of time with the wrong result, then you would not mind having a single date either (with the wrong result being reported until you have a kadi version with the correct date).

taldcroft commented 13 hours ago

The reason for the time range is so that we can promote this in advance of the AGASC 1.8 promotion. We cannot guarantee that there will be JUL2924 loads (anomaly/radiation shutdown). Even if there are JUL2924 loads, we don't know today exactly when they will start to an accuracy of better than about 12 hours.

==> With a single date transition, there will almost certainly be a few obsids that use the wrong AGASC very near the promotion time. It is credible that this will lead to a hard failure that we can't fix without another release.

The assumption behind the two-date solution is this: if catalogs stars are identified with either catalog, then that identification (the AGASC ID) is correct and acceptable. Remember, the only thing it is getting in the identification is the star ID and magnitude, but basically we already accept that we don't have a reproducible way to get that magnitude because it could be coming from the supplement which changes.

I think it is NOT credible (in less than a week of catalogs) that changes in proper motion will lead to misidentifying stars. In other words, an interloper star suddenly matches the catalog yag/zag better than the original commanded stars. If the worst case between the expected promotion and actual promotion is less than a week, then those are the only catalogs where the code will be trying 1.8 for 1.7 catalogs.

Finally, we can certainly strip out this code in favor of a single transition date AFTER we know the definitive time of the first obsid that was planned with 1.8 with #333.

taldcroft commented 13 hours ago

I addressed all of @jeanconn's comments.

Since we already have #333 to implement a single-date transition after we have a definitive value, we won't forget to do it. This PR is written, tested, and ready to go to support an uncertain transition date, so we should merge and release this.

taldcroft commented 13 hours ago

Standby, I just noticed a logic flaw that needs to be fixed.

javierggt commented 13 hours ago

The agasc module already provides a mechanism to handle promoting the catalog. The catalog promotion is timed so starcheck is run for a specific load. That is about a week before the loads are actually run. In that week, it is trivial to release ska3-flight with a single change: the date. That is as long as you don't want to start adding things to the release.

That is much simpler than this whole argument, this PR, plus PR #333, and would give zero errors.

I do not see which meaningful errors we would get with having a single date versus having two dates, especially after my example above that shows that the time range would give the wrong catalog.

I honestly do not see what is it we gain. I have explained it in three different ways, and I have given an example of how the code does not fix the issue you want to solve.

taldcroft commented 12 hours ago

@javierggt - It seems that you are not accepting that you cannot know the transition time until AFTER it actually happens. This is a fundamental problem. We may have a release ready with some guess for the time (based on our guess from a week before which is the FSDS time), but it can stop being valid. In addition doing it this way requires users to immediately upgrade or else potentially see a problem.

How do you propose to deal with that possibility?

javierggt commented 12 hours ago

I see what you say, and the closest I can get is to make a fast release with a single change.

I would like to turn it around:

Let's assume that, because of what you say, conf.date_start_agasc1p8_earliest date is wrong.

It seems you are not accepting that the changes proposed do not actually fix the issue you want to fix, because calling kadi after promotion will give you the wrong catalog version for all stars observed after conf.date_start_agasc1p8_earliest. Just like having a single date.

What do you propose to deal with this?

javierggt commented 12 hours ago

In other words, both solutions have the problem that there might be an interval of time where kadi guesses the wrong AGASC version. They will fail in the same way, on the same stars. One solution is more complex. Which one do you choose?

taldcroft commented 12 hours ago

In other words, both solutions have the problem that there might be an interval of time where kadi guesses the wrong AGASC version. They will fail in the same way, on the same stars.

This PR will try both 1.8 and 1.7 in that transition range. So --

If the code uses 1.8 for a 1.7 catalog and gets position matches and AGASC ID's for all stars, then that is just fine. It is not credible that the ID's are actually wrong, meaning using the wrong AGASC version can be OK. If you don't get matches, then it falls through to 1.7 and succeeds. This option always succeeds.

Instead with #333, there is only 1.8 to try and you don't get matches, there is no fall back and the code returns an incorrect answer. This option can fail.

javierggt commented 12 hours ago

ok, I'm ok after Jean is satisfied with her comments' resolutions.

taldcroft commented 12 hours ago

There was an important flaw that got missed in review, but now fixed in c89a189. It's worth looking at that. This issue would apply to both #332 and #333.

Previously for observations after the earliest transition date, it was requiring that AGASC 1.8 be available. With that fix it will fall through to 1.7. I'm updating the testing log shortly.

jeanconn commented 12 hours ago

I did my review assuming that going forward that AGASC 1.8 would be required and figured we'd defer the skare3 release containing this code if AGASC 1.8 was delayed.

jeanconn commented 12 hours ago

But I suppose I didn't call that out as a new interface impact.

taldcroft commented 12 hours ago

I did my review assuming that going forward that AGASC 1.8 would be required and figured we'd defer the skare3 release containing this code if AGASC 1.8 was delayed.

My intent was that this code could be deployed any time in advance of promotion. I think it now meets that goal.

taldcroft commented 12 hours ago

Tests in the description have been updated.

jeanconn commented 10 hours ago

I figure, technically, the interface change is now that there's the possibility of StarIdentificationFailed error.