Climatology Missing Values Test: Use masked array data and mask to create z_idx

iwensu0313 commented 3 months ago

This is a follow-up PR related to #107.

While working on #107 I noticed that we were overwriting the MISSING data flag with a GOOD data flag. The #107 PR implemented a fix that is more of a bandaid that created the desired output, but doesn't address the underlying undesired behavior.

The undesired behavior was occurring because we weren't using the masked array inp to create z_idx. See thread: https://github.com/ioos/ioos_qc/pull/107#discussion_r1546310270. @kthyng helped find the fix for this change!

I also put flag_arr[inp.mask] = QartodFlags.MISSING back where it was because I think we should be made aware if the underlying code is not working as expected (e.g. if newer numpy package versions changed the way masked arrays behave).

iwensu0313 commented 3 months ago

@ocefpaf

ocefpaf commented 3 months ago

Awesome! @iwensu0313 I don't want to tax you too much but do you believe you can come up with a test case to avoid future regressions? Something that fails in the previous code but passes now would do.

iwensu0313 commented 3 months ago

Awesome! @iwensu0313 I don't want to tax you too much but do you believe you can come up with a test case to avoid future regressions? Something that fails in the previous code but passes now would do.

@ocefpaf, unless I'm missing something, this test QartodClimatologyMissingTest added in #107 does exactly that? It basically is testing against the scenario provided by lgarzio in https://github.com/ioos/ioos_qc/issues/65#issue-996257078

It fails previously but passes w/ the fixes added in both https://github.com/ioos/ioos_qc/pull/107 and https://github.com/ioos/ioos_qc/pull/108. The change introduced in this PR is meant to be part of #107, but #107 is already merged.

ocefpaf commented 3 months ago

I guess I was confused b/c that test passed in 107 but it shouldn't without this change, no?

iwensu0313 commented 3 months ago

@ocefpaf yea it's a bit confusing w/ the changes spread across two PRs.

Essentially I am replacing the fix in #107 with the fix here to address the same issue. The test I added in #107 tests both fixes but this is the better fix. The test in #107 passed because I moved flag_arr[inp.mask] = QartodFlags.MISSING to the bottom of the check() function. That was the bandaid fix. In this Pr #108 , I move flag_arr[inp.mask] = QartodFlags.MISSING back to where it was before, which would cause the QartodClimatologyMissingTest test to fail again. Fixing how we define z_idx when zspan is not set to z_idx = np.ma.array(data=~np.isnan(inp.data), mask=inp.mask, fill_value=999999), addressing the underlying issue, allows the test to pass again.

Let me know if it's worth walking through what's happening in the check() function to better explain this approach!

The summary is that

In PR #107, what would happen in the check() function is that we would still allow missing data to be given a flag of 1 (incorrect flag). Then overwrite it with a flag of 9 (the correct flag)
In this PR #108, what would happen is in the check() function is that we would apply flag of 9 to missing data (correct flag)

iwensu0313 commented 3 months ago

If I'm missing something though.. let me know if there is an additional test case I should be testing! I am not sure what that would be.

But I'm pretty sure QartodClimatologyMissingTest is the right test for this PR. Because it will run through the new line of code I added z_idx = np.ma.array(data=~np.isnan(inp.data), mask=inp.mask, fill_value=999999) and ensure that the flag given to the missing values are indeed 9. PR #107 ran through the same checks w/ that test but was less efficient because it allowed the incorrect flag to be applied first before applying the correct one

ocefpaf commented 3 months ago

Essentially I am replacing the fix in #107 with the fix here to address the same issue.

My bad. I created that confusion!

ioos / ioos_qc

Climatology Missing Values Test: Use masked array data and mask to create z_idx #108