Open yjml opened 6 years ago
Turns out my server's rstudio packages were out of date, was on 1.6
1.9 appears to be fine in getting these over to R, but things are still problematic - the data in the problematic columns are e.g. <environment: 0xc1957c0>
instead of the anticipated contents e.g. 30
Is there any input on this? It's quite old, yet happening to me. I'm essentially brand new to reticulate, so I can't tell if this is expected or an actual issue... any input/clarification from someone who knows?
I tried two ways and ran into issues on both.
boto3
, storing as a list (result
) and then converting to a pd.DataFrame (df
), attempting to use that in R via py$df
syntax. Here is an obfuscated result for py$df
:> head(py$df)
A B C D E
1 abc <environment: 0x000001de915241c8> 123 123 <environment: 0x000001de8693ede0>
2 def <environment: 0x000001de9150ccd8> 123 123 <environment: 0x000001de86920278>
3 gh <environment: 0x000001de8e081ec8> 123 123 NaN
4 ijk <environment: 0x000001de9220b050> 123 123 NaN
5 lmn <environment: 0x000001de921dbe98> 123 123 NaN
lapply
to convert each entry into a data frame and then use do.call(rbind, df_list)
to build the full data frame. I got the same error as @yjml:
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class ‘c("decimal.Decimal", "python.builtin.object")’ to a data.frame
In looking at some of the list entries, I did note the use of Decimal().
[[1]]$Order
Decimal('32')
Thanks for any advice on how to figure this out and/or to confirm if this is intended. I have no issues running the python code that builds this dataframe directly, so it seems it's something about the handoff from python to R.
Thanks for bring this up again, this is on the backlog.
This makes me think what decimal.Decimal
objects should convert to in R. R doubles doesn't seem quite right, but I'm not sure what's a better alternative.
@t-kalinowski thanks for taking a look! Honestly Decimal
was completely new to me. I don't feel I have the chops to say much about this decision... but it did dawn on me that for my use-case, this isn't exactly "programmed" into dynamodb as a Decimal? It seems like an artifact of python, no?
I.e. in looking at AWS directly, my Order
variable above is stored as a Number. So, again, I'm probably the wrong person to answer this as AWS, boto3, and reticulate are all pretty new to me... but if the worry is that R can't handle some of the fancy specifications of Decimal
, it's not clear in what cases those fancy specifications could make it down given the original data.
Put another way, in what case does as.numeric(Decimal('32'))
(or whatever the call would be) go wrong when the values are starting from an AWS container (again, in my use case)?
I found this thread while running into the same issue on AWS. I don't think I've seen a resolution or work around mentioned. Has this been resolved or discussed elsewhere?
If you have an R data.frame where one of the columns is an unconverted list of Decimal
objects, you can convert it to an R float like this:
Decimals_to_numeric <- function(x) {
py_float <- import_builtins()$float
purrr::map_dbl(x, py_float)
}
df$col_of_Decimals <- Decimals_to_numeric(df$col_of_Decimals)
I'm having some issues with some pandas dataframes failing to make it over to the R side:
The origin of these is spark -> parquet stored on S3 > pyarrow.parquet with s3fs to read, along the lines of the following (the
import()
version fails as well)Attached are three pickled subsets of the pandas dataframes - 2 with problems, 1 from the same pipeline that is successful. reticulate_pandadf.zip
In particular, the problem appears to be the variable
status
in 1 anddaysupp
in 2 - deselecting these allows things to make it back over to Results