rstudio / reticulate

R Interface to Python
https://rstudio.github.io/reticulate
Apache License 2.0
1.67k stars 327 forks source link

fastai model$summary() #1348

Closed turgut090 closed 1 year ago

turgut090 commented 1 year ago

Hello. model$summary() from fastai throws:

model$summary()
Error: TypeError: expected bytes, PrettyString found 

But in python summary method works fine. Is it an issue of reticulate?

Here is minimal python code:

from fastai.tabular.all import *

path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')
df_main,df_test = df.iloc[:10000].copy(),df.iloc[10000:].copy()

cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [Categorify, FillMissing, Normalize]
splits = RandomSplitter()(range_of(df_main))

to = TabularPandas(df_main, procs, cat_names, cont_names, y_names="salary", splits=splits)

dls = to.dataloaders()

learn = tabular_learner(dls, layers=[200,100], metrics=accuracy)
learn.summary()

However, summary method fails from R

t-kalinowski commented 1 year ago

Thanks for reporting. I just gave it a try but couldn't reproduce the error. Can you give a little more info about what OS/python-verion/package-versions you're using? image

turgut090 commented 1 year ago

Thanks for you reply. If you run the same code from R with reticulate. It fails and throws:

model$summary()
Error: TypeError: expected bytes, PrettyString found 

R code:


ft=reticulate::import('fastai.tabular.all')

# download
#URLs_ADULT_SAMPLE()

# read data
df = data.table::fread('/Users/turgutabd/.fastai/data/adult_sample/adult.csv')

dep_var = 'salary'
cat_names = c('workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race')
cont_names = c('age', 'fnlwgt', 'education-num')

procs = reticulate::r_to_py(list(ft$FillMissing,ft$Categorify,ft$Normalize))

dls = ft$TabularPandas(df, procs,cat_names =  cat_names,cont_names =  cont_names,
                       y_names = dep_var, splits = reticulate::r_to_py(list(c(1L:32000L),c(32000L:32560L))))

to = dls$dataloaders()

model = ft$tabular_learner(to,layers=c(200L,100L), metrics=ft$accuracy)

model$summary()
# Error: TypeError: expected bytes, PrettyString found  
sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] fastai_2.2.1   magrittr_2.0.3

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.10       here_1.0.1        lattice_0.20-45   png_0.1-8         withr_2.5.0       rprojroot_2.0.3   grid_4.2.2       
 [8] sys_3.4.1         lifecycle_1.0.3   jsonlite_1.8.4    credentials_1.3.2 rlang_1.0.6       cli_3.6.0         data.table_1.14.6
[15] rstudioapi_0.14   fs_1.6.1          Matrix_1.5-3      generics_0.1.3    vctrs_0.5.2       reticulate_1.28   tools_4.2.2      
[22] glue_1.6.2        purrr_1.0.1       compiler_4.2.2    askpass_1.1       openssl_2.0.5     usethis_2.1.6  
reticulate::py_config()
python:         /Users/turgutabd/Library/r-miniconda-arm64/envs/r-reticulate/bin/python
libpython:      /Users/turgutabd/Library/r-miniconda-arm64/envs/r-reticulate/lib/libpython3.8.dylib
pythonhome:     /Users/turgutabd/Library/r-miniconda-arm64/envs/r-reticulate:/Users/turgutabd/Library/r-miniconda-arm64/envs/r-reticulate
version:        3.8.16 | packaged by conda-forge | (default, Feb  1 2023, 16:01:13)  [Clang 14.0.6 ]
numpy:          /Users/turgutabd/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/numpy
numpy_version:  1.24.2

NOTE: Python version was forced by use_python function
turgut090 commented 1 year ago

Interesting. This helped to print:

result = reticulate::py_config_error_message(model$summary())
cat(trimws(gsub('Detected Python configuration:','',result)))
TabularModel (Input shape: 64 x 7)
============================================================================
Layer (type)         Output Shape         Param #    Trainable 
============================================================================
                     64 x 6              
Embedding                                 60         True      
____________________________________________________________________________
                     64 x 8              
Embedding                                 136        True      
____________________________________________________________________________
                     64 x 5              
Embedding                                 40         True      
____________________________________________________________________________
                     64 x 8              
Embedding                                 136        True      
____________________________________________________________________________
                     64 x 5              
Embedding                                 35         True      
____________________________________________________________________________
                     64 x 4              
Embedding                                 24         True      
____________________________________________________________________________
                     64 x 3              
Embedding                                 9          True      
Dropout                                                        
BatchNorm1d                               6          True      
____________________________________________________________________________
                     64 x 200            
Linear                                    8400       True      
ReLU                                                           
BatchNorm1d                               400        True      
____________________________________________________________________________
                     64 x 100            
Linear                                    20000      True      
ReLU                                                           
BatchNorm1d                               200        True      
____________________________________________________________________________
                     64 x 2              
Linear                                    202        True      
____________________________________________________________________________

Total params: 29,648
Total trainable params: 29,648
Total non-trainable params: 0

Optimizer used: <function Adam at 0x2c618b8b0>
Loss function: FlattenedLoss of CrossEntropyLoss()

Callbacks:
  - TrainEvalCallback
  - CastToTensor
  - Recorder
  - ProgressCallback
t-kalinowski commented 1 year ago

Thanks, I can reproduce, fix incoming.

turgut090 commented 1 year ago

Any news?

t-kalinowski commented 1 year ago

Fix should be out tomorrow. I tracked it down to https://github.com/rstudio/reticulate/blob/3f8305dde11569125aa7e5cfe0dde43f88ed79d0/src/python.cpp#L220, but found another slightly more serious bug in the process, that's causing me to take a closer look at strings.

The R entry point that's raising the error itself is py_repr(), to get by until the fix is out you could patch the py_repr symbol locally with something like (not tested, sorry):

py_repr <- function(x) {
  fn <- py_run_string("print_repr = lambda x: print(repr(x))")$print_repr
  fn(x)
  invisible(x)
}
t-kalinowski commented 1 year ago

@turgut090 This should be fixed on main now. Can you please install the development version and see if it works for you:

remotes::install_github("rstudio/reticulate")
turgut090 commented 1 year ago

@t-kalinowski thanks. It works now, however I need cat('text') otherwise the output is not as it was previously. Please, see below:

summary(learn)
[1] "TabularModel (Input shape: 64 x 7)\n============================================================================\nLayer (type)         Output Shape         Param #    Trainable \n============================================================================\n                     64 x 6              \nEmbedding                                 60         True      \n____________________________________________________________________________\n                     64 x 8              \nEmbedding                                 136        True      \n____________________________________________________________________________\n                     64 x 5              \nEmbedding                                 40         True      \n____________________________________________________________________________\n                     64 x 8              \nEmbedding                                 136        True      \n____________________________________________________________________________\n                     64 x 5              \nEmbedding                                 35         True      \n____________________________________________________________________________\n                     64 x 4              \nEmbedding                                 24         True      \n____________________________________________________________________________\n                     64 x 3              \nEmbedding                                 9          True      \nDropout                                                        \nBatchNorm1d                               6          True      \n____________________________________________________________________________\n                     64 x 200            \nLinear                                    8400       True      \nReLU                                                           \nBatchNorm1d                               400        True      \n____________________________________________________________________________\n                     64 x 100            \nLinear                                    20000      True      \nReLU                                                           \nBatchNorm1d                               200        True      \n____________________________________________________________________________\n                     64 x 2              \nLinear                                    202        True      \n____________________________________________________________________________\n\nTotal params: 29,648\nTotal trainable params: 29,648\nTotal non-trainable params: 0\n\nOptimizer used: <function Adam at 0x2bbe21a20>\nLoss function: FlattenedLoss of CrossEntropyLoss()\n\nCallbacks:\n  - TrainEvalCallback\n  - CastToTensor\n  - Recorder\n  - ProgressCallback"

With cat:

> cat(summary(learn))
TabularModel (Input shape: 64 x 7)                                        
============================================================================
Layer (type)         Output Shape         Param #    Trainable 
============================================================================
                     64 x 6              
Embedding                                 60         True      
____________________________________________________________________________
                     64 x 8              
Embedding                                 136        True      
____________________________________________________________________________
                     64 x 5              
Embedding                                 40         True      
____________________________________________________________________________
                     64 x 8              
Embedding                                 136        True      
____________________________________________________________________________
                     64 x 5              
Embedding                                 35         True      
____________________________________________________________________________
                     64 x 4              
Embedding                                 24         True      
____________________________________________________________________________
                     64 x 3              
Embedding                                 9          True      
Dropout                                                        
BatchNorm1d                               6          True      
____________________________________________________________________________
                     64 x 200            
Linear                                    8400       True      
ReLU                                                           
BatchNorm1d                               400        True      
____________________________________________________________________________
                     64 x 100            
Linear                                    20000      True      
ReLU                                                           
BatchNorm1d                               200        True      
____________________________________________________________________________
                     64 x 2              
Linear                                    202        True      
____________________________________________________________________________

Total params: 29,648
Total trainable params: 29,648
Total non-trainable params: 0

Optimizer used: <function Adam at 0x2bbe21a20>
Loss function: FlattenedLoss of CrossEntropyLoss()

Callbacks:
  - TrainEvalCallback
  - CastToTensor
  - Recorder
  - ProgressCallback
t-kalinowski commented 1 year ago

I think this would best be done by writing a summary() S3 method for the learn object:

summary.fastai.learner.Learner <- function(object, ...) {
  s <- object$summary()
  writeLines(s)
  invisible(s)
}

Or perhaps make it the default print() method. You can take a look at keras:::summary.keras.engine.training.Model() too for an example implementation.