pola-rs / r-polars

Polars R binding
https://pola-rs.github.io/r-polars/
Other
460 stars 36 forks source link

More consistency in methods to convert Series/DataFrame to R #921

Open etiennebacher opened 5 months ago

etiennebacher commented 5 months ago

Currently this is what we have for DataFrame and Series:

library(polars)

options(polars.do_not_repeat_call = TRUE)

####### Series

series_vec = pl$Series(letters[1:3])

# output depends on datatype
series_vec$to_r()
#> [1] "a" "b" "c"

# identical
series_vec$to_r_vector() 
#> [1] "a" "b" "c"
series_vec$to_vector() 
#> [1] "a" "b" "c"

# not identical
series_vec$to_r_list() 
#> [[1]]
#> [1] "a"
#> 
#> [[2]]
#> [1] "b"
#> 
#> [[3]]
#> [1] "c"
series_vec$to_list() 
#> Error: Execution halted with the following contexts
#>    1: $ - syntax error: to_list is not a method/attribute of the class RPolarsSeries

####### DataFrame

df = pl$DataFrame(x = letters[1:3])

# output depends on datatype
df$to_r()
#> Error: Execution halted with the following contexts
#>    1: $ - syntax error: to_r is not a method/attribute of the class RPolarsDataFrame

# not identical
df$to_r_data_frame() 
#> Error: Execution halted with the following contexts
#>    1: $ - syntax error: to_r_data_frame is not a method/attribute of the class RPolarsDataFrame
df$to_data_frame() 
#>   x
#> 1 a
#> 2 b
#> 3 c

# not identical
df$to_r_list() 
#> Error: Execution halted with the following contexts
#>    1: $ - syntax error: to_r_list is not a method/attribute of the class RPolarsDataFrame
df$to_list()
#> $x
#> [1] "a" "b" "c"

I think we should have more consistency in the names of the methods:

@eitsupi what do you think? Also, @grantmcdermott in case you want to participate (mostly about the second point)

eitsupi commented 5 months ago

Agree. I don't see the benefit of having more than one, so I think we need to remove some of them and encourage users to use the S3 method e.g. as.vector(), as.data.frame() instead of $to_vector(), $to_data_frame(). (i.e. should not use them on the document)

I really dislike not knowing at first glance whether the $to_data_frame() will be an RPolarsDataFrame or a data.frame, so I actually don't even have a problem with all these methods being private.

eitsupi commented 5 months ago

Another thing I feel is does it make sense for <DataFrame>$to_list() to return "list of vectors" instead of "list of Serieses"?

eitsupi commented 5 months ago

Another thing I feel is does it make sense for <DataFrame>$to_list() to return "list of vectors" instead of "list of Serieses"?

I noticed that there is already a method in DetaFrame called get_columns() that returns a list of Series. https://docs.rs/polars/0.38.3/polars/frame/struct.DataFrame.html#method.get_columns

The name to_list() may not be consistent...