tidypyverse / tidypandas

A grammar of data manipulation for pandas inspired by tidyverse
https://tidypyverse.github.io/tidypandas/
MIT License
93 stars 7 forks source link

Error in slice_head when n is less than chunk size for the any group. Behavior different from dplyr. #50

Open grahitr opened 1 year ago

grahitr commented 1 year ago
In [1]: import tidypandas.tidy_accessor as tp
In [2]: import pandas as pd
In [3]: df = pd.DataFrame({"a":[1,1,1,2], "b": [1,2,3,4]})
In [4]: df.tp.slice_head(n=2, by="a")
Minimum group size is  1
/Users/a0r0qfj/py_envs/python3.10.7/lib/python3.10/site-packages/astroid/node_classes.py:94: DeprecationWarning: The 'astroid.node_classes' module is deprecated and will be replaced by 'astroid.nodes' in astroid 3.0.0
  warnings.warn(
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In [4], line 1
----> 1 df.tp.slice_head(n=2, by="a")

File ~/py_envs/python3.10.7/lib/python3.10/site-packages/tidypandas/tidy_accessor.py:382, in tp.slice_head(self, n, prop, rounding_type, by)
    375 def slice_head(self
    376                , n = None
    377                , prop = None
    378                , rounding_type = "round"
    379                , by = None
    380                ):
    381     tf = tidyframe(self._obj, copy = False, check = False)
--> 382     return tf.slice_head(n = n
    383                          , prop = prop
    384                          , rounding_type = rounding_type
    385                          , by = by
    386                          ).to_pandas(copy = False)

File ~/py_envs/python3.10.7/lib/python3.10/site-packages/tidypandas/tidyframe_class.py:4124, in tidyframe.slice_head(self, n, prop, rounding_type, by)
   4122 if n > min_group_size:
   4123     print("Minimum group size is ", min_group_size)
-> 4124 assert n <= min_group_size,\
   4125     "arg 'n' should not exceed the size of any chunk after grouping"
   4127 ro_name = _generate_new_string(cn) 
   4128 res = (self.group_modify(lambda x: x.slice(np.arange(n))
   4129                          , by = by
   4130                          , preserve_row_order = True
   4131                          , row_order_column_name = ro_name
   4132                          )
   4133            )

AssertionError: arg 'n' should not exceed the size of any chunk after grouping

Same operation in R, doesn't throw an error. Instead it returns the chunk with size = min(size of the chunk, n)

> library(tidyverse)
> df = tibble(a=c(1,1,1,2), b=c(1,2,3,4))
> df %>% group_by(a) %>% slice_head(n=2) %>% ungroup()
# A tibble: 3 × 2
      a     b
  <dbl> <dbl>
1     1     1
2     1     2
3     2     4
talegari commented 1 year ago

This was done intentionally.

Design question: If an user seeks say 5 rows per group and we cant provide it ... should we give an error stating it or silently provide what we can?