static-frame / static-frame

Immutable and statically-typeable DataFrames with runtime type and data validation
https://staticframe.dev
Other
417 stars 33 forks source link

Type hint for result of filtering rows of Frame indicates Series when it should be Union[Series, Frame] #922

Open aaronkurz opened 4 months ago

aaronkurz commented 4 months ago

Description

When filtering values of a Frame, the resulting type is indicated to be Series, even when the result has more than one row and thus is a Frame. When executing the code and testing, the type is indeed Frame, but the PyCharm type checker complains about wrong type usage. This might be an issue with PyCharm, with my code or with static-frame. But since PyCharm is a pretty common IDE, I still wanted to bring this up and ask for clarification.

Example

Provide an interactive Python session, or Python code, to demonstrate the issue. Please keep example data to a minimum.

import static_frame as sf

example_frame = sf.Frame.from_dict({
    "A": [1, 2, 3, 4, 5],
    "B": [4, 5, 6, 7, 8],
})

filtered_frame = example_frame.loc[example_frame["A"] > 2]
print(filtered_frame)
print(type(filtered_frame))

Output:

<Frame>
<Index> A       B       <<U1>
<Index>
2       3       6
3       4       7
4       5       8
<int64> <int64> <int64>
<class 'static_frame.core.frame.Frame'>

Process finished with exit code 0

Type hint/inference: image

Platform

Run the following function (static-frame >= 0.8.1) and provide the results to define your platform and environment:

>>> import static_frame as sf
>>> sf.Platform.display()

Output:

/usr/lib/python3/dist-packages/pytz/__init__.py:31: SyntaxWarning: invalid escape sequence '\s'
  match = re.match("^#\s*version\s*([0-9a-z]*)\s*$", line)
<Series: platform>
<Index>
platform           Linux-6.5.0-26-generic-x86_64-with-glibc2.35
sys.version        3.12.2 (main, Feb 25 2024, 16:35:05) [GCC 11.4.0]
static-frame       2.5.1
numpy              1.26.4
pandas             2.2.1
xlsxwriter         <ModuleNotFoundError>
openpyxl           <ModuleNotFoundError>
xarray             <ModuleNotFoundError>
tables             <ModuleNotFoundError>
pyarrow            <ModuleNotFoundError>
msgpack            <ModuleNotFoundError>
msgpack_numpy      <ModuleNotFoundError>
<<U13>             <object>
flexatone commented 3 months ago

Many thank for isolating this issue. We try to use overloads to get the right type out of loc selections, but still have more work to do. In this case, it might be possible to improve the static analysis by identifying that the type of selection is by Boolean array. I will investigate and see if I can improve this.