posit-dev / great-tables

Make awesome display tables using Python.
https://posit-dev.github.io/great-tables/
MIT License
1.43k stars 48 forks source link

Refactor `fmt_time`, `fmt_date` and `fmt_datetime` #290

Closed jrycw closed 2 months ago

jrycw commented 2 months ago

Related to #222 , I'm thinking we could possibly delegate the validation of lifts to the built-in datetime.datetime.fromisoformat function.

Fixes: #299

codecov-commenter commented 2 months ago

Codecov Report

Attention: Patch coverage is 90.00000% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 81.68%. Comparing base (479eb73) to head (3b609b4). Report is 22 commits behind head on main.

Files Patch % Lines
great_tables/_formats.py 90.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #290 +/- ## ========================================== - Coverage 81.71% 81.68% -0.03% ========================================== Files 41 41 Lines 4325 4287 -38 ========================================== - Hits 3534 3502 -32 + Misses 791 785 -6 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

machow commented 2 months ago

@rich-iannone can you verify that datetime.fromisoformat() handles the cases you were thinking of? Can you be sure to enumerate in a comment the cases you try (for posterity)?

rich-iannone commented 2 months ago

I mostly wanted to make sure that datetime.fromisoformat() handled most of the common cases and I found it's quite capable. The following input variations results in correct (the same) output:

import datetime
from great_tables import vals

iso_str = [
    "2020-05-20",
    "2020-05-20 22:30",
    "2020-05-20T22:30",
    "2020-05-20 22:30:05",
    "2020-05-20 22:30:05.824" # milliseconds *must* have 3 decimal places
    ]

fmtd_dates = []

for val in iso_str:
    fmtd_dates.append((vals.fmt_date(datetime.datetime.fromisoformat(val), date_style="wd_m_day_year"))[0])

fmtd_dates
['Wed, May 20, 2020',
 'Wed, May 20, 2020',
 'Wed, May 20, 2020',
 'Wed, May 20, 2020',
 'Wed, May 20, 2020']

So I think the code change is great here. Thanks, @jrycw !

jrycw commented 2 months ago

I mostly wanted to make sure that datetime.fromisoformat() handled most of the common cases and I found it's quite capable. The following input variations results in correct (the same) output:

import datetime
from great_tables import vals

iso_str = [
    "2020-05-20",
    "2020-05-20 22:30",
    "2020-05-20T22:30",
    "2020-05-20 22:30:05",
    "2020-05-20 22:30:05.824" # milliseconds *must* have 3 decimal places
    ]

fmtd_dates = []

for val in iso_str:
    fmtd_dates.append((vals.fmt_date(datetime.datetime.fromisoformat(val), date_style="wd_m_day_year"))[0])

fmtd_dates
['Wed, May 20, 2020',
 'Wed, May 20, 2020',
 'Wed, May 20, 2020',
 'Wed, May 20, 2020',
 'Wed, May 20, 2020']

So I think the code change is great here. Thanks, @jrycw !

Please correct me if I'm mistaken, but it seems you're testing fmt_date instead of fmt_datetime. Since vals.fmt_datetime isn't available, it needs to be tested from GT.

rich-iannone commented 2 months ago

@jrycw you are right. I mistakenly thought that .fmt_date() used the same internal function. I checked out this branch and tried out some variations. I found that this could be further improved if there is a single validation and transformation from datetime.fromisoformat(x) (we could drop the use of _validate_iso_datetime_str()).

With that change, the following code works:

import datetime
from great_tables import GT
import polars as pl

df = pl.DataFrame({"x": [
    "2020-05-20",
    "2020-05-20 22:30",
    "2020-05-20T22:30",
    "2020-05-20 22:30:05",
    "2020-05-20T22:30:05.232" # milliseconds *must* have 3 decimal places
    ]})

GT(df).fmt_datetime(columns="x", date_style="wd_m_day_year", time_style="h_m_p")

It produces this:

fmt_datetime_refactor_tbl

This seems like a good improvement because the first and last cases failed before, but they provide formatted output now (and they work as can be expected).

To do this, I renamed _iso_to_datetime() to _iso_str_to_datetime() with this changed it to:

def _iso_str_to_datetime(x: str) -> datetime:
    """
    Converts a string in ISO format to a datetime object.

    Args:
        x (str): The string to be converted.

    Returns:
        datetime: The converted datetime object.
    """
    return datetime.fromisoformat(x)

The code in fmt_datetime() changes to:


        # If `x` is a string, assume it is an ISO datetime string and convert it to a datetime object
        if isinstance(x, str):

            # Convert the ISO datetime string to a datetime object
            x = _iso_str_to_datetime(x)

@jrycw @machow does this change make sense? I could push the changes to the branch (or you could modify). Also, some new test cases are needed here.

jrycw commented 2 months ago

LGTM! If @machow is also on board with this, could @rich-iannone push your changes to the branch? I'll then proceed to add the relevant tests based on your demonstration.

machow commented 2 months ago

This sounds good to me!

jrycw commented 2 months ago

Related to #299, it turns out we could make similar modifications to fmt_date. Here's a summary of the changes:

I believe the modifications are correct, but please feel free to correct any mistakes I may have made.

jrycw commented 2 months ago

This might sound a bit wild, but I just discovered a similar function called datetime.time.fromisoformat() in Python. This means we could make similar modifications as follows:

jrycw commented 2 months ago

I attempted to remove _iso_to_date and replace it with _iso_str_to_date in _generate_data_vals, and it seems to have worked.