posit-dev / great-tables

Make awesome display tables using Python.
https://posit-dev.github.io/great-tables/
MIT License
1.9k stars 71 forks source link

Consolidate ordered list code using new `_create_ordered_list()` function #407

Closed jrycw closed 2 months ago

jrycw commented 3 months ago

Hello team,

I would like to propose consolidating the ordered list-related code using a new function, _create_ordered_list().

Our current logic works well, such as list({k: True for k in x}) in the _unique_set() function. However, even after working on the project for a while, I find it confusing to understand why the True is needed. It makes me question whether we need this Boolean value for some if-else logic. Eventually, I realize it's just a placeholder—a nice trick but not very intuitive.

A few hours ago, I came across a post from Michael Driscoll on LinkedIn and found that list(dict.fromkeys(x).keys()) is more readable and neat, with a functional programming style. What do you think about this modification?

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 97.05882% with 1 line in your changes missing coverage. Please review.

Project coverage is 86.71%. Comparing base (bacda7e) to head (6f55a84). Report is 32 commits behind head on main.

Files with missing lines Patch % Lines
great_tables/_scss.py 80.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #407 +/- ## ========================================== + Coverage 86.37% 86.71% +0.33% ========================================== Files 42 42 Lines 4683 4703 +20 ========================================== + Hits 4045 4078 +33 + Misses 638 625 -13 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jrycw commented 3 months ago

It seems we're encountering a CI issue with the Test for saving table image with browser (pull_request).

machow commented 2 months ago

Hey, thanks for looking into this. I think some of the potential value of an ordered set (e.g. using a dictionary) is:

However, looking at the code -- checking membership might not be too time intensive (unless there are tons of column names or groupings). One thing we could do, if the use of True seems distracting is create a custom class for OrderedSet. WDYT?

It could be something like...

from collections.abc import Set

class OrderedSet(Set):
    # d could also just be an iterable
    def __init__(self, d: list):
        self._d = {k: True for k in d}

    def __contains__(self, k):
        return k in self._d

    def __iter__(self):
        return iter(self._d)

    def __len__(self):
        return len(self._d)

    def __repr__(self):
        repr_list = repr(list(self._d.keys()))
        return f"{type(self).__name__}({repr_list})"

ord = OrderedSet(["a", "b", "a", "c"])
len(ord) # 3
"b" in ord # True
list(ord) # ["a", "b", "c"]
repr(ord)
jrycw commented 2 months ago

This looks great! Both ideas aim to hide the implementation details, allowing users and developers to easily create an ordered set. Personally, I might write the code like this:

from collections.abc import Set

class OrderedSet(Set):
    def __init__(self, d: list):
        self._d = self._create(d)

    def _create(self, d: list):
        return {k: True for k in d}

    def as_set(self):
        return set(self._d)

    def as_list(self):
        return list(self._d)

    def as_dict(self):
        return dict(self._d)

    def __contains__(self, k):
        return k in self._d

    def __iter__(self):
        return iter(self._d)

    def __len__(self):
        return len(self._d)

    def __repr__(self):
        cls_name = type(self).__name__
        lst = self.as_list()
        return f"{cls_name}({lst!r})"

ord = OrderedSet(["a", "b", "a", "c"])
print(len(ord))  # 3
print("b" in ord)  # True
print(ord.as_set())  # {'b', 'a', 'c'}
print(ord.as_list())  # ['a', 'b', 'c']
print(ord.as_dict())  # {'a': True, 'b': True, 'c': True}
print(repr(ord))  # OrderedSet(['a', 'b', 'c'])
machow commented 2 months ago

That version seems great!

jrycw commented 2 months ago

My suggestion is to treat OrderedSet (the implementation might change in the future?) as an internal class and use _create_ordered_list for development purposes. However, feel free to update the commit as you see fit.

machow commented 2 months ago

I'm okay with leaving out _create_ordered_set, since it's in the _utils.py submodule, so there's some internal-ness being signaled. Do you want to make the change? Otherwise, I'm happy to tweak and merge!

jrycw commented 2 months ago

@machow , I think you may have missed the changes I made in the last commit. Please let me know if any further modifications are needed for this PR.

machow commented 2 months ago

Ah sorry -- I meant to say, I'd be in favor of only using OrderedSet and removing _create_ordered_list(), since OrderedSet will be in the _utils.py submodule, which starts with underscore.

(but I accidentally said _create_ordered_set() instead 😞). The changes look great--I'm happy to go in and remove _create_ordered_list() and merge if useful :)

jrycw commented 2 months ago

Got it! Please go ahead and adjust the code as you suggested.

machow commented 2 months ago

Alright, tweaked to use OrderedSet directly -- @rich-iannone let's go over this while pairing!