opensafely-core / job-runner

A client for running jobs in an OpenSAFELY secure environment, requested via job-server (q.v.)
Other
4 stars 5 forks source link

get_obsolete_files() not functioning #750

Open madwort opened 1 month ago

madwort commented 1 month ago

@evansd flagged up an intermittent error on a CI run in this slack thread.

=================================== FAILURES ===================================
________________ test_handle_job_finalized_success_with_delete _________________

db = None

    def test_handle_job_finalized_success_with_delete(db):
        api = StubExecutorAPI()

        # insert previous outputs
        job_factory(
            state=State.SUCCEEDED,
            status_code=StatusCode.SUCCEEDED,
            outputs={"output/old.csv": "highly_sensitive"},
        )

        job = api.add_test_job(ExecutorState.FINALIZED, State.RUNNING, StatusCode.FINALIZED)
        api.set_job_result(job, outputs={"output/file.csv": "highly_sensitive"})

        run.handle_job(job, api)

        # executor state
        assert job.id in api.tracker["cleanup"]
        # its been cleaned up and is now unknown
        assert api.get_status(job).state == ExecutorState.UNKNOWN

        # our state
        assert job.state == State.SUCCEEDED
        assert job.status_message == "Completed successfully"
        assert job.outputs == {"output/file.csv": "highly_sensitive"}
>       assert api.deleted["workspace"][Privacy.MEDIUM] == ["output/old.csv"]
E       AssertionError: assert [] == ['output/old.csv']
E         Right contains one more item: 'output/old.csv'
E         Full diff:
E         - ['output/old.csv']
E         + []

tests/test_run.py:563: AssertionError

@inglesp could reproduce ~2% of the time, and spotted that freezing time would make the test reliably pass.

madwort commented 1 month ago

This PR makes the test reliably fail: https://github.com/opensafely-core/job-runner/pull/749

following @inglesp pointing at this line, I think the issue is that get_obsolete_files() does for existing in list_outputs_from_action() which seems like it should look up the outputs from the previous job, but actually gets the outputs from the current job - and therefore I think will always return [] . The test passes because we sort on created_at & in 98% of test runs that is equal so the array is “unsorted” and we unintentionally get the previous job.

I haven't yet looked into whether this means that any outputs that should have been removed from a workspace are still in place.

madwort commented 1 month ago

Some discussion on slack. For now we've merged the amended tests with xfail in #755 , @evansd to take a look for a quick fix if/when he has time.

This fixes the code in question (whilst breaking everything else)