Unit test failure on fido ska3-prime on archive regress test

jeanconn commented 1 year ago

I'm seeing a test fail on ska3-prime with fido on test_commands_create_archive_regress.

This passes on ska3 on fido and on ska3-prime (2023.1rc9) on kady.


jeanconn-fido> pytest
============================================================================ test session starts =============================================================================
platform linux -- Python 3.10.8, pytest-7.2.1, pluggy-1.0.0
rootdir: /proj/sot/ska/jeanproj/git/kadi, configfile: pytest.ini
plugins: anyio-3.6.2
collected 193 items                                                                                                                                                          

kadi/commands/tests/test_commands.py ...............F...........................................................                                                       [ 38%]
kadi/commands/tests/test_states.py ......................x..........................................x....................                                              [ 83%]
kadi/tests/test_events.py ..........                                                                                                                                   [ 88%]
kadi/tests/test_occweb.py ......................                                                                                                                       [100%]

================================================================================== FAILURES ==================================================================================
__________________________________________________________________ test_commands_create_archive_regress[2] ___________________________________________________________________

tmpdir = local('/tmp/pytest-of-jeanconn/pytest-1591/test_commands_create_archive_r1'), version_env = '2'

    @pytest.mark.skipif("not HAS_MPDIR")
    def test_commands_create_archive_regress(tmpdir, version_env):
        """Create cmds archive from scratch and test that it matches flight

        This tests over an eventful month that includes IU reset/NSM, SCS-107
        (radiation), fast replan, loads approved but not uplinked, etc.
        """
        update_cmds = update_cmds_v2 if version_env == "2" else update_cmds_v1
        commands = commands_v2 if version_env == "2" else commands_v1

        kadi_orig = os.environ.get("KADI")
        start = CxoTime("2021:290")
        stop = start + 30 * u.day
        cmds_flight = commands.get_cmds(start + 3 * u.day, stop - 3 * u.day)
        cmds_flight.fetch_params()

        with conf.set_temp("commands_dir", str(tmpdir)):
            try:
                os.environ["KADI"] = str(tmpdir)
                update_cmds.main(
                    (
                        "--lookback=30" if version_env == "2" else f"--start={start.date}",
                        f"--stop={stop.date}",
                        f"--data-root={tmpdir}",
                    )
                )
                # Force reload of LazyVal
                del commands.IDX_CMDS._val
                del commands.PARS_DICT._val
                del commands.REV_PARS_DICT._val

                # Make sure we are seeing the temporary cmds archive
                cmds_empty = commands.get_cmds(start - 60 * u.day, start - 50 * u.day)
                assert len(cmds_empty) == 0

                cmds_local = commands.get_cmds(start + 3 * u.day, stop - 3 * u.day)

                cmds_local.fetch_params()
                if len(cmds_flight) != len(cmds_local):
                    # Code to debug problems, leave commented for production
                    # out = "\n".join(cmds_flight.pformat_like_backstop())
                    # Path("cmds_flight.txt").write_text(out)
                    # out = "\n".join(cmds_local.pformat_like_backstop())
                    # Path("cmds_local.txt").write_text(out)
                    assert len(cmds_flight) == len(cmds_local)

                # 'starcat_idx' param in OBS cmd does not match since the pickle files
                # are different, so remove it.
                for cmds in (cmds_local, cmds_flight):
                    for cmd in cmds:
                        if cmd["tlmsid"] == "OBS" and "starcat_idx" in cmd["params"]:
                            del cmd["params"]["starcat_idx"]

                for attr in ("tlmsid", "date", "params"):
>                   assert np.all(cmds_flight[attr] == cmds_local[attr])
E                   AssertionError: assert False
E                    +  where False = <function all at 0x7f4cb93c44c0>(<Column name=...': 'AOFUNCEN'} == <Column name=...': 'AOFUNCEN'}
E                    +    where <function all at 0x7f4cb93c44c0> = np.all
E                       Use -v to get more diff)

kadi/commands/tests/test_commands.py:325: AssertionError
---------------------------------------------------------------------------- Captured stdout call ----------------------------------------------------------------------------
******************************************

jeanconn commented 1 year ago

This looks to me to be tiny differences in the archived and calculated NSM quaternion on day 2021:296. While this doesn't look to be a problem for every platform (and passes on ska3), perhaps the test should use an isclose for for that attitude anyway?

taldcroft commented 1 year ago

Can you run on kady? I am would rather not make this another blocker to the release.

jeanconn commented 1 year ago

OK. I set up a ska3-prime on kadi and this test did pass there. I'll update the top and set this low priority.

As an RFE it seems to me that NSM quaternions are a different thing from the rest of the quaternions and I don't know how I would pull those derived attitudes out easily. If you are also considering adding "informational" quaternions that would represent the estimated attitude during NSM recovery, perhaps different kinds of attitude "labels" would make sense anyway?

taldcroft commented 1 year ago

As an RFE it seems to me that NSM quaternions are a different thing from the rest of the quaternions

They are different only in that they may be inaccurate, mostly in rotation about the sun line (RASL). Given the way that these get used operationally, I'm not quite seeing the use case of your suggestions.

jeanconn commented 1 year ago

Fair. I suppose I was thinking of it mostly for this debug use case; if a mismatch comes up again in a non validation context it took a minute to figure out this could be a numeric issue because this quaternion was not just read from a file or table. And to make the regression go away it seems like an exception for the not accurate quaternions would be the thing?

jeanconn commented 11 months ago

Closed by #288

sot / kadi

Unit test failure on fido ska3-prime on archive regress test #278