xCDAT / xcdat

An extension of xarray for climate data analysis on structured grids.
https://xcdat.readthedocs.io/en/latest/
Apache License 2.0
101 stars 11 forks source link

[Feature]: Add default value for `axes` argument in `add_missing_bounds()` #563

Closed lee1043 closed 7 months ago

lee1043 commented 8 months ago

Is your feature request related to a problem?

I think from v0.6.0 add_missing_bounds started requiring parameters.

I liked its previous usage, e.g., ds.bounds.add_missing_bounds(), which automatically loops through the Dataset’s axes and attempts to adds bounds to its coordinates if they don’t exist. (By the way, the documentation is still written in this way).

Describe the solution you'd like

I wonder if add_missing_bounds can be operated in two optional ways: (1) ds.bounds.add_missing_bounds(): loop through axes and fill missed bounds (2) ds.bounds.add_missing_bounds("X"): fill missed bounds for "X" axis

Describe alternatives you've considered

No response

Additional context

(added to the post)

Some PMP and obs4MIP codes have implemented add_missing_bounds() routines but with xcdat update those codes needs changes, and if ds.bounds.add_missing_bounds() could be still used, it is going to be very efficient to update xcdat version in PMP dependency.

@acordonez, @gleckler1, @bosup any thoughts?

tomvothecoder commented 8 months ago

Hey @lee1043, sorry for this issue. In v0.6.0 add_missing_bounds() was updated to include the axes argument to align with the changes to open_dataset() and open_mfdataset(). Both of these APIs accept a list of axes (e.g., add_bounds = ["X", "Y"]), which get passed to add_missing_bounds() under the hood. We changed add_bounds to a list of axes because we don't want to assume how time bounds are created (frequency vs. midpoint). If the user decides to pass "T", time bounds will be generated using frequency.

To avoid this breaking change, I should have set a default value similar to open_dataset() and open_mfdataset(). I do think it is a good idea for the user to explicitly define which axes they want missing bounds added if they call add_missing_bounds() directly, because not all users might want to add missing bounds for all axes.

In your case, if you want to add bounds for all axes, you can do this:

# Constant variable to add missing bounds to.
AXES_BNDS = ["X", "Y", "T", "Z"]

# Update all of these lines to this:
add_missing_bounds(AXES_BNDS)
pochedls commented 7 months ago

Is it possible to add a default positional argument if the user doesn't pass in an argument? I think it would make sense to default this to ["T", "X", "Y"] (X and Y should have been added on open, so if a user is calling this, they likely want time bounds; we shouldn't add vertical bounds because this is a rarer case).

tomvothecoder commented 7 months ago

Is it possible to add a default positional argument if the user doesn't pass in an argument? I think it would make sense to default this to ["T", "X", "Y"] (X and Y should have been added on open, so if a user is calling this, they likely want time bounds; we shouldn't add vertical bounds because this is a rarer case).

I actually thought about this over the weekend too. This sounds reasonable to me.

It just involves adding a default value and fixing a few tests. If somebody has time to open up a PR before me that would be great! Otherwise I'll get to it soon.

pochedls commented 7 months ago

This might involve updating this line to:

def add_missing_bounds(self, axes: Optional[List[CFAxisKey]] = ['X', 'Y', 'T']) -> xr.Dataset:  # noqa: C901

[I didn't test this, but I think it is on the right track. Tom notes we'd need to fix tests. Maybe that noqa statement would need an update, too].

@lee1043 – would this be sufficient top address this issue? Does Z need to be included (I think you'd get bounds, but it would be unclear if they are the bounds you want)?

If this works – would you be willing to give this fix a shot in a PR (@acordonez, @gleckler1, @bosup)?

acordonez commented 7 months ago

@pochedls I started working on the update you suggested, but now my pre-commit checks has thrown an error in another part of the add_missing_bounds() function. Is there someone who'd have time to help me with this, or should I wait for you/Tom/someone else to have time to implement the fix themselves?

acordonez commented 7 months ago

I got rid of my other errors using:

    def add_missing_bounds(  # noqa: C901
        self, axes: List[CFAxisKey] = ["X", "Y", "T"]  # noqa: C901
    ) -> xr.Dataset:  # noqa: C901

I'm getting some failures when I run pytest but how do I tell what the failures actually are?

Screenshot 2023-11-27 at 1 51 12 PM

pochedls commented 7 months ago

@acordonez – thanks for working on this – I don't see failures (usually red X marks, I think) in that screenshot, but if there are failures I think the last couple lines should point to a htmlcov folder (or something like that), which you can open in your web browser and it highlights the failed lines in red. Reach out on if it would be helpful to chat via WebEx.

acordonez commented 7 months ago

@pochedls Ok thanks it sounds like I was misreading the test outputs then.

tomvothecoder commented 7 months ago

I'm getting some failures when I run pytest but how do I tell what the failures actually are?

The test results marked with a lower-case "x" denotes "xfail" (expected to fail), so no worries here. The console should also output the coverage report, in addition to the htmlcov/index.html file.