should '$' and '[' behave differently for pandas DataFrames?

rstudio / reticulate

R Interface to Python

https://rstudio.github.io/reticulate

Apache License 2.0

1.66k stars 328 forks source link

should '$' and '[' behave differently for pandas DataFrames? #251

Open kevinushey opened 6 years ago

kevinushey commented 6 years ago

E.g. in Python:

```{python}
import pandas as pd
pdf = pd.DataFrame({"pop": [1]})
print(pdf["pop"]) # accesses pop column
print(pdf.pop)    # accesses pop DataFrame method


However, for Python objects exposed to R, `[[` attempts to first access attributes rather than columns:

library(reticulate) df <- data.frame(pop = 1) pdf <- r_to_py(df) pdf$pop <bound method DataFrame.pop of pop 0 1.0> pdf[["pop"]] <bound method DataFrame.pop of pop 0 1.0>

Should [[ prefer accessing items rather than attributes for DataFrames?

I believe a similar question exists for Python dictionaries, and other objects implementing __getitem__ in general.

kevinushey commented 6 years ago

After a chat with @jjallaire, we agree that we should try to migrate the semantics such that:

$ is analogous to Python's . operator; that is, it attempts to retrieve attributes (typically methods) on the object;
[[ and [ are analogous to Python's [ operator; that is, it is used for accessing items (__getitem__).

We'll plan to issue a warning if the use of the $ operator ended up resolving an item rather than an attribute, just so existing user code has a path for migration.

flying-sheep commented 5 years ago

I think another possibility would have been to make $ and [[ equivalent to getattr and [ equivalent to __getitem__. That would have the advantage that it’s easy to get attributes with calculated names, e.g.

for (attr_name in attrs) print(py_obj[[attr_name]])

however I think it would be quite confusing to let [[ and [ do completely different things, and it’s still possible to get aforementioned functionality via the more obscure

for (attr_name in attrs) print(`$`(py_obj, attr_name))

dfalbel commented 1 year ago

With #1431 the ambiguity is solved with:

py_run_string('
import pandas as pd
pdf = pd.DataFrame({"pop": [1]})
print(pdf["pop"]) # accesses pop column
print(pdf.pop)    # accesses pop DataFrame method
')

py$pdf$pop
py$pdf@pop

t-kalinowski commented 1 year ago

After taking a closer look, I see that $ already prefers accessing attributes.

The benefit of adding a @ method would be that it would remove the potential for silent errors, where a call like x@foo would raise an attribute error, while x$foo would fall back silently to getitem().

  library(reticulate)
  py_run_string("import pandas as pd")
  pdf <- py_eval('pd.DataFrame({"pop": [1], "abc": [2]})', convert = F)
  pdf$pop
#> <bound method DataFrame.pop of    pop  abc
#> 0    1    2>
  pdf@pop
#> <bound method DataFrame.pop of    pop  abc
#> 0    1    2>
  pdf["pop"]
#> 0    1
#> Name: pop, dtype: int64
  pdf[["pop"]]
#> 0    1
#> Name: pop, dtype: int64

  pdf$abc
#> 0    2
#> Name: abc, dtype: int64
  pdf@abc
#> 0    2
#> Name: abc, dtype: int64
  pdf["abc"]
#> 0    2
#> Name: abc, dtype: int64
  pdf[["abc"]]
#> 0    2
#> Name: abc, dtype: int64

^{Created on 2023-08-15 with reprex v2.0.2}