praktiskt / featuretoolsR

An R interface to the Python module Featuretools
Other
49 stars 8 forks source link

out of bounds when executing tidy_feature_matrix #1

Closed favstats closed 5 years ago

favstats commented 6 years ago

Hi there!

I just stumbled upon your package and I am incredibly happy someone made the effort implement this. Thanks a lot for this!

I started out with your example and unfortunately, I encountered an error when creating a tidy_feature_matrix (the idea of which I absolutely love!)

# pacman::p_install_gh("magnusfurugard/featuretoolsR")

pacman::p_load(tidyverse, featuretoolsR)

# Create some mock data
set_1 <- data.frame(key = 1:100, value = sample(letters, 100, T))
set_2 <- data.frame(key = 1:100, value = sample(LETTERS, 100, T))

# Create entityset
es <- as_entityset(set_1, index = "key", entity_id = "set_1", id = "demo")

es <- es %>%
  add_entity(
    df = set_2, 
    entity_id = "set_2", 
    index = "key"
  )

es <- es %>%
  add_relationship(
    set1 = "set_1", 
    set2 = "set_2", 
    idx = "key"
  )

ft_matrix <- es %>%
  dfs(
    target_entity = "set_1", 
    trans_primitives = c("and", "divide")
  )

tidy <- tidy_feature_matrix(ft_matrix)

tidy

Error:

# Error in py_call_impl(callable, dots$args, dots$keywords) : IndexError: index 100 is out # of bounds for axis 0 with size 100

Through a traceback, I was able to narrow down the problem to the py_to_r function, which seems to have a problem with the 0 indexing of Python.

See here:

reticulate::py_to_r(ft_matrix[[1]])

Error:

# Error in py_call_impl(callable, dots$args, dots$keywords) : IndexError: index 100 is out # of bounds for axis 0 with size 100

Again, thank you for making this available and I totally understand that a lot of this is probably work in progress. I am just glad someone did this :)

Best,

Fabio

Update

I tried with different data and this seems to work fine. So it seems it has something to do with the example data, maybe?

ft <- reticulate::import("featuretools")

es = ft$demo$load_mock_customer(return_entityset=T)

ft_matrix <- es %>%
  dfs(
    target_entity = "customers", 
    trans_primitives = c("and", "divide")
  )

tidy <- tidy_feature_matrix(ft_matrix)

tidy

Works just fine!

praktiskt commented 6 years ago

Hi, and thanks for your very kind words. This is indeed a work in progress, so more bugs to come.. I guess.

I am unable to recreate the error (R 3.4.2 / macOS Sierra 10.12.6) . Can you send your sessionInfo() as well as Python environment details?

favstats commented 6 years ago

Thank you for your quick reply. Don't consider this high priority because it's really just the example that doesn't work for me, everything else works just fine :) Still would be interesting to know what's wrong here.

Here's my sessioninfo:

R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C                    LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bindrcpp_0.2.2         reticulate_1.10.0.9002 featuretoolsR_0.1.0    dplyr_0.7.6           

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18       lubridate_1.7.4    lattice_0.20-35    tidyr_0.8.1        class_7.3-14      
 [6] assertthat_0.2.0   ipred_0.9-7        psych_1.8.3.3      foreach_1.4.4      R6_2.2.2          
[11] plyr_1.8.4         magic_1.5-9        stats4_3.5.0       ggplot2_3.0.0      pillar_1.3.0      
[16] rlang_0.2.2        lazyeval_0.2.1     caret_6.0-80       rstudioapi_0.7     data.table_1.11.4 
[21] kernlab_0.9-27     rpart_4.1-13       Matrix_1.2-14      splines_3.5.0      CVST_0.2-2        
[26] ddalpha_1.3.4      gower_0.1.2        stringr_1.3.1      foreign_0.8-70     munsell_0.5.0     
[31] broom_0.4.4        compiler_3.5.0     pkgconfig_2.0.1    mnormt_1.5-5       dimRed_0.1.0      
[36] nnet_7.3-12        tidyselect_0.2.4   prodlim_2018.04.18 tibble_1.4.2       DRR_0.0.3         
[41] codetools_0.2-15   RcppRoll_0.3.0     withr_2.1.2        crayon_1.3.4       MASS_7.3-49       
[46] recipes_0.1.3      ModelMetrics_1.2.0 grid_3.5.0         nlme_3.1-137       jsonlite_1.5      
[51] gtable_0.2.0       pacman_0.4.6       magrittr_1.5       scales_1.0.0       stringi_1.1.7     
[56] reshape2_1.4.3     timeDate_3043.102  robustbase_0.93-3  geometry_0.3-6     pls_2.7-0         
[61] lava_1.6.3         iterators_1.0.9    tools_3.5.0        glue_1.3.0         DEoptimR_1.0-8    
[66] purrr_0.2.5        sfsmisc_1.1-2      survival_2.41-3    abind_1.4-5        parallel_3.5.0    
[71] colorspace_1.4-0   knitr_1.20         bindr_0.1.1  

And here is my python setup that I got with reticulate::py_config

python:         C:\Users\Fabio\ANACON~1\python.exe
libpython:      C:/Users/Fabio/ANACON~1/python36.dll
pythonhome:     C:\Users\Fabio\ANACON~1
version:        3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)]
Architecture:   64bit
numpy:          C:\Users\Fabio\ANACON~1\lib\site-packages\numpy
numpy_version:  1.14.3
featuretools:   C:\Users\Fabio\ANACON~1\lib\site-packages\featuretools\__init__.p
praktiskt commented 6 years ago

I really am unable to duplicate this. I've tried it on Windows 10 (R 3.5.0), Ubuntu 18.04 (R 3.5.1) and my original setup macOS Sierra 10.12.6 (R 3.4.2).

Are you able to retrieve the results of just ft_matrix[[1]], without running reticulate::py_to_r?

favstats commented 6 years ago

That's odd. Just reran the entire thing, still encountering the same problem. And yes, I am able to run ft_matrix[[1]] just fine.

There must be some issue with my setup. It's really weird though that this only happens with the simulated dataset, different data works fine.

praktiskt commented 6 years ago

(traveling, so limited ability to test on different OS')

Trying to run featuretoolsR on an Anaconda-installation of Python crashes the R-environment for me now. Hey, more issues!

Could you try using a non-Anaconda version of Python? Only now realizing I've not used an Anaconda-installed Python on any of the machines. I've successfully run it on both 2.7 and 3.5 on my Mac.

You can swap between different Python binaries with reticulate::use_python("path/to/binary") and verify it changed after with reticulate::py_config().

praktiskt commented 5 years ago

Closing due to age. I cannot reproduce.