kapelner / bartMachine

An R-Java Bayesian Additive Regression Trees implementation
MIT License
61 stars 27 forks source link

Problem with missing data #7

Closed theodds closed 8 years ago

theodds commented 8 years ago

Running the code directly from the vignette, I get the following error when attempting to fit the model with missing covariates.

Error in .jcall(java_bart_machine, "V", "addTrainingDataRow", as.character(model_matrix_training_data[i,  : 
java.lang.NullPointerException

Specifically, I ran

library("bartMachine")
options(java.parameters="-Xmx1000m")
set_bart_machine_num_cores(4)
y <- automobile$log_price
X <- automobile; X$log_price <- NULL
bart_machine <- bartMachine(X=X, y=y, use_missing_data = TRUE,
                             use_missing_data_dummies_as_covars = TRUE)

Particularly confusing, because I ran this code on old versions of the package as well (and got the same error), so I'm unsure whether this is a problem with the package or a problem with my install. For reference, this issue also appears here.

kapelner commented 8 years ago

It works for me. You left a couple lines out. Otherwise the problem is in your setup.

this line goes first

options(java.parameters="-Xmx1000m") library("bartMachine") set_bart_machine_num_cores(4)

and you need to load the data

data(automobile) y <- automobile$log_price X <- automobile; X$log_price <- NULL bart_machine <- bartMachine(X=X, y=y, use_missing_data = TRUE, use_missing_data_dummies_as_covars = TRUE)

bartMachine initializing with 50 trees...

Now building bartMachine for regression ...Covariate importance prior ON. Missing data feature ON. Missingness used as covariates.

building BART with mem-cache speedup...

Iteration 100/500 thread: 4

....

Iteration 500/500 thread: 1 done building BART in 2.065 sec burning and aggregating chains from all threads... done evaluating in sample data...done

On Tue, Aug 11, 2015 at 6:28 PM, theodds notifications@github.com wrote:

Running the code directly from the vignette, I get the following error when attempting to fit the model with missing covariates.

Error in .jcall(java_bart_machine, "V", "addTrainingDataRow", as.character(model_matrix_training_data[i, : java.lang.NullPointerException

Specifically, I ran

library("bartMachine") options(java.parameters="-Xmx1000m") set_bart_machine_num_cores(4) y <- automobile$log_price X <- automobile; X$log_price <- NULL bart_machine <- bartMachine(X=X, y=y, use_missing_data = TRUE, use_missing_data_dummies_as_covars = TRUE)

Particularly confusing, because I ran this code on old versions of the package as well, so I'm unsure whether this is a problem with the package or a problem with my install. For reference, this issue also appears here https://github.com/mlr-org/mlr/issues/422.

— Reply to this email directly or view it on GitHub https://github.com/kapelner/bartMachine/issues/7.

Adam Kapelner, Ph.D. Assistant Professor of Mathematics Queens College, City University of New York 65-30 Kissena Blvd., Kiely Hall Room 604 Flushing, NY, 11367 M: 516-435-6795 kapelner.com (scholar https://scholar.google.com/citations?user=TzgMmnoAAAAJ|research gate http://www.researchgate.net/profile/Adam_Kapelner2|publons https://publons.com/author/431881/adam-kapelner#profile)

theodds commented 8 years ago

I loaded the data, just forgot to include it in this post. Anyways, I suppose this is related to either my R install or JDK/rJava setup, but it seems strange. I only get the problem with missing data, the rest of the vignette material runs fine.

kapelner commented 8 years ago

Did you try doing options first to specifiy the RAM and then load the package?

On Wed, Aug 12, 2015 at 12:43 AM, theodds notifications@github.com wrote:

I loaded the data, just forgot to include it in this post. Anyways, I suppose this is related to either my R install or JDK/rJava setup, but it seems strange. I only get the problem with missing data, the rest of the vignette material runs fine.

— Reply to this email directly or view it on GitHub https://github.com/kapelner/bartMachine/issues/7#issuecomment-130159280.

Adam Kapelner, Ph.D. Assistant Professor of Mathematics Queens College, City University of New York 65-30 Kissena Blvd., Kiely Hall Room 604 Flushing, NY, 11367 M: 516-435-6795 kapelner.com (scholar https://scholar.google.com/citations?user=TzgMmnoAAAAJ|research gate http://www.researchgate.net/profile/Adam_Kapelner2|publons https://publons.com/author/431881/adam-kapelner#profile)

theodds commented 8 years ago

Did you try doing options first to specifiy the RAM and then load the package?

From a fresh session I ran your code

#this line goes first
options(java.parameters="-Xmx1000m")
library("bartMachine")
set_bart_machine_num_cores(4)
#and you need to load the data
data(automobile)
y <- automobile$log_price
X <- automobile; X$log_price <- NULL
bart_machine <- bartMachine(X=X, y=y, use_missing_data = TRUE,
                             use_missing_data_dummies_as_covars = TRUE)

and got the following output

Loading required package: rJava
Loading required package: car
Loading required package: randomForest
randomForest 4.6-10
Type rfNews() to see new features/changes/bug fixes.
Loading required package: missForest
Loading required package: foreach
foreach: simple, scalable parallel programming from Revolution Analytics
Use Revolution R for scalability, fault tolerance and more.
http://www.revolutionanalytics.com
Loading required package: itertools
Loading required package: iterators
Welcome to bartMachine v1.2.0! You have 0.93GB memory available.

bartMachine now using 4 cores.

bartMachine initializing with 50 trees...
Error in .jcall(java_bart_machine, "V", "addTrainingDataRow", as.character(model_matrix_training_data[i,  : 
  java.lang.NullPointerException
kapelner commented 8 years ago

Go into your R library folder and delete the folder bartMachine. Then reinstall and try again. Also, move your RAM up to something absurd like 4g if you can.

On Wed, Aug 12, 2015 at 1:04 AM, theodds notifications@github.com wrote:

Did you try doing options first to specifiy the RAM and then load the package?

From a fresh session I ran

this line goes first

options(java.parameters="-Xmx1000m") library("bartMachine") set_bart_machine_num_cores(4)

and you need to load the data

data(automobile) y <- automobile$log_price X <- automobile; X$log_price <- NULL bart_machine <- bartMachine(X=X, y=y, use_missing_data = TRUE, use_missing_data_dummies_as_covars = TRUE)

and got the following output

Loading required package: rJava Loading required package: car Loading required package: randomForest randomForest 4.6-10 Type rfNews() to see new features/changes/bug fixes. Loading required package: missForest Loading required package: foreach foreach: simple, scalable parallel programming from Revolution Analytics Use Revolution R for scalability, fault tolerance and more.http://www.revolutionanalytics.com Loading required package: itertools Loading required package: iterators Welcome to bartMachine v1.2.0! You have 0.93GB memory available.

bartMachine now using 4 cores.

bartMachine initializing with 50 trees... Error in .jcall(java_bart_machine, "V", "addTrainingDataRow", as.character(model_matrix_training_data[i, : java.lang.NullPointerException

— Reply to this email directly or view it on GitHub https://github.com/kapelner/bartMachine/issues/7#issuecomment-130167101.

Adam Kapelner, Ph.D. Assistant Professor of Mathematics Queens College, City University of New York 65-30 Kissena Blvd., Kiely Hall Room 604 Flushing, NY, 11367 M: 516-435-6795 kapelner.com (scholar https://scholar.google.com/citations?user=TzgMmnoAAAAJ|research gate http://www.researchgate.net/profile/Adam_Kapelner2|publons https://publons.com/author/431881/adam-kapelner#profile)

theodds commented 8 years ago

I reinstalled the latest release on CRAN (after deleting the package both with remove.packages() and by deleting the folder directly) and get the same error after running

options(java.parameters="-Xmx4000m")
library("bartMachine")
set_bart_machine_num_cores(4)
#and you need to load the data
data(automobile)
y <- automobile$log_price
X <- automobile; X$log_price <- NULL
bart_machine <- bartMachine(X=X, y=y, use_missing_data = TRUE,
                             use_missing_data_dummies_as_covars = TRUE)

(the only difference here being upping to 4GB of ram).

theodds commented 8 years ago

Debugging at the point of the error, it looks like (for whatever reason) the error occurs when adding the second entry:

>     bart_machine <- bartMachine(X=X, y=y, use_missing_data = TRUE,
+                                  use_missing_data_dummies_as_covars = TRUE)
bartMachine initializing with 50 trees...
Called from: build_bart_machine(X, y, Xy, num_trees, num_burn_in, num_iterations_after_burn_in, 
    alpha, beta, k, q, nu, prob_rule_class, mh_prob_steps, debug_log, 
    run_in_sample, s_sq_y, cov_prior_vec, use_missing_data, covariates_to_permute, 
    num_rand_samps_in_library, use_missing_data_dummies_as_covars, 
    replace_missing_data_with_x_j_bar, impute_missingness_with_rf_impute, 
    impute_missingness_with_x_j_bar_for_lm, mem_cache_for_speed, 
    serialize, seed, verbose)
Browse[1]> debug: for (i in 1:nrow(model_matrix_training_data)) {
    .jcall(java_bart_machine, "V", "addTrainingDataRow", as.character(model_matrix_training_data[i, 
        ]))
}
Browse[2]> 
debug: i
Browse[2]> i
NULL
Browse[2]> 
debug: .jcall(java_bart_machine, "V", "addTrainingDataRow", as.character(model_matrix_training_data[i, 
    ]))
Browse[2]> i
[1] 1
Browse[2]> 
debug: i
Browse[2]> 
debug: .jcall(java_bart_machine, "V", "addTrainingDataRow", as.character(model_matrix_training_data[i, 
    ]))
Browse[2]> i
[1] 2
Browse[2]> 
Error in .jcall(java_bart_machine, "V", "addTrainingDataRow", as.character(model_matrix_training_data[i,  : 
  java.lang.NullPointerException
kapelner commented 8 years ago

Print out model_matrix_training_data and model_matrix_training_data[2, ] to see what's going on there.

On Wed, Aug 12, 2015 at 1:19 PM, theodds notifications@github.com wrote:

Debugging at the point of the error, it looks like (for whatever reason) the error occurs when adding the second entry:

bart_machine <- bartMachine(X=X, y=y, use_missing_data = TRUE,
  • use_missing_data_dummies_as_covars = TRUE) bartMachine initializing with 50 trees... Called from: build_bart_machine(X, y, Xy, num_trees, num_burn_in, num_iterations_after_burn_in, alpha, beta, k, q, nu, prob_rule_class, mh_prob_steps, debug_log, run_in_sample, s_sq_y, cov_prior_vec, use_missing_data, covariates_to_permute, num_rand_samps_in_library, use_missing_data_dummies_as_covars, replace_missing_data_with_x_j_bar, impute_missingness_with_rf_impute, impute_missingness_with_x_j_bar_for_lm, mem_cache_for_speed, serialize, seed, verbose) Browse[1]> debug: for (i in 1:nrow(model_matrix_training_data)) { .jcall(java_bart_machine, "V", "addTrainingDataRow", as.character(model_matrix_training_data[i, ])) } Browse[2]> debug: i Browse[2]> i NULL Browse[2]> debug: .jcall(java_bart_machine, "V", "addTrainingDataRow", as.character(model_matrix_training_data[i, ])) Browse[2]> i [1] 1 Browse[2]> debug: i Browse[2]> debug: .jcall(java_bart_machine, "V", "addTrainingDataRow", as.character(model_matrix_training_data[i, ])) Browse[2]> i [1] 2 Browse[2]> Error in .jcall(java_bart_machine, "V", "addTrainingDataRow", as.character(model_matrix_training_data[i, : java.lang.NullPointerException

— Reply to this email directly or view it on GitHub https://github.com/kapelner/bartMachine/issues/7#issuecomment-130380723.

Adam Kapelner, Ph.D. Assistant Professor of Mathematics Queens College, City University of New York 65-30 Kissena Blvd., Kiely Hall Room 604 Flushing, NY, 11367 M: 516-435-6795 kapelner.com (scholar https://scholar.google.com/citations?user=TzgMmnoAAAAJ|research gate http://www.researchgate.net/profile/Adam_Kapelner2|publons https://publons.com/author/431881/adam-kapelner#profile)

theodds commented 8 years ago

The input to addTrainingDataRow for the second observation similar to the first observation (the same, except for the response value); at the point of error, I get:

Browse[2]> as.character(model_matrix_training_data[1, ])
 [1] "3"               NA                "2"               "88.6"           
 [5] "168.8"           "64.1"            "48.8"            "2548"           
 [9] "4"               "130"             "3.47"            "2.68"           
[13] "9"               "111"             "5000"            "21"             
[17] "27"              "0"               "1"               "1"              
[21] "0"               "1"               "0"               "0"              
[25] "0"               "0"               "0"               "0"              
[29] "1"               "1"               "0"               "1"              
[33] "0"               "0"               "0"               "0"              
[37] "0"               "0"               "0"               "0"              
[41] "0"               "0"               "1"               "0"              
[45] "0"               "1"               "0"               "0"              
[49] "0"               "0"               "9.5100745254521"
Browse[2]> as.character(model_matrix_training_data[2, ])
 [1] "3"                NA                 "2"                "88.6"            
 [5] "168.8"            "64.1"             "48.8"             "2548"            
 [9] "4"                "130"              "3.47"             "2.68"            
[13] "9"                "111"              "5000"             "21"              
[17] "27"               "0"                "1"                "1"               
[21] "0"                "1"                "0"                "0"               
[25] "0"                "0"                "0"                "0"               
[29] "1"                "1"                "0"                "1"               
[33] "0"                "0"                "0"                "0"               
[37] "0"                "0"                "0"                "0"               
[41] "0"                "0"                "1"                "0"               
[45] "0"                "1"                "0"                "0"               
[49] "0"                "0"                "9.71111565988867"

The only thing weird I guess is that the NAs aren't converted to strings, but everything else is.

theodds commented 8 years ago

For completeness, getting rid of as.character I get

Browse[2]> model_matrix_training_data[1, ]
             symboling      normalized_losses              num_doors 
              3.000000                     NA               2.000000 
            wheel_base                 length                  width 
             88.600000             168.800000              64.100000 
                height            curb_weight          num_cylinders 
             48.800000            2548.000000               4.000000 
           engine_size                   bore                 stroke 
            130.000000               3.470000               2.680000 
     compression_ratio             horsepower               peak_rpm 
              9.000000             111.000000            5000.000000 
              city_mpg            highway_mpg       fuel_type_diesel 
             21.000000              27.000000               0.000000 
         fuel_type_gas         aspiration_std       aspiration_turbo 
              1.000000               1.000000               0.000000 
body_style_convertible     body_style_hardtop   body_style_hatchback 
              1.000000               0.000000               0.000000 
      body_style_sedan       body_style_wagon        wheel_drive_4wd 
              0.000000               0.000000               0.000000 
       wheel_drive_fwd        wheel_drive_rwd  engine_location_front 
              0.000000               1.000000               1.000000 
  engine_location_rear       engine_type_dohc          engine_type_l 
              0.000000               1.000000               0.000000 
       engine_type_ohc       engine_type_ohcf       engine_type_ohcv 
              0.000000               0.000000               0.000000 
     engine_type_rotor       fuel_system_1bbl       fuel_system_2bbl 
              0.000000               0.000000               0.000000 
      fuel_system_4bbl        fuel_system_idi        fuel_system_mfi 
              0.000000               0.000000               0.000000 
      fuel_system_mpfi       fuel_system_spdi       fuel_system_spfi 
              1.000000               0.000000               0.000000 
   M_normalized_losses                 M_bore               M_stroke 
              1.000000               0.000000               0.000000 
          M_horsepower             M_peak_rpm            y_remaining 
              0.000000               0.000000               9.510075 
Browse[2]> model_matrix_training_data[2, ]
             symboling      normalized_losses              num_doors 
              3.000000                     NA               2.000000 
            wheel_base                 length                  width 
             88.600000             168.800000              64.100000 
                height            curb_weight          num_cylinders 
             48.800000            2548.000000               4.000000 
           engine_size                   bore                 stroke 
            130.000000               3.470000               2.680000 
     compression_ratio             horsepower               peak_rpm 
              9.000000             111.000000            5000.000000 
              city_mpg            highway_mpg       fuel_type_diesel 
             21.000000              27.000000               0.000000 
         fuel_type_gas         aspiration_std       aspiration_turbo 
              1.000000               1.000000               0.000000 
body_style_convertible     body_style_hardtop   body_style_hatchback 
              1.000000               0.000000               0.000000 
      body_style_sedan       body_style_wagon        wheel_drive_4wd 
              0.000000               0.000000               0.000000 
       wheel_drive_fwd        wheel_drive_rwd  engine_location_front 
              0.000000               1.000000               1.000000 
  engine_location_rear       engine_type_dohc          engine_type_l 
              0.000000               1.000000               0.000000 
       engine_type_ohc       engine_type_ohcf       engine_type_ohcv 
              0.000000               0.000000               0.000000 
     engine_type_rotor       fuel_system_1bbl       fuel_system_2bbl 
              0.000000               0.000000               0.000000 
      fuel_system_4bbl        fuel_system_idi        fuel_system_mfi 
              0.000000               0.000000               0.000000 
      fuel_system_mpfi       fuel_system_spdi       fuel_system_spfi 
              1.000000               0.000000               0.000000 
   M_normalized_losses                 M_bore               M_stroke 
              1.000000               0.000000               0.000000 
          M_horsepower             M_peak_rpm            y_remaining 
              0.000000               0.000000               9.711116 
kapelner commented 8 years ago

This makes no sense since I'm seeing the same thing and it works for me. Set debug_log = TRUE and find the java log file - it will be in your workspace or in the bartMachine folder. That should print the exact Java error.

On Wed, Aug 12, 2015 at 1:28 PM, theodds notifications@github.com wrote:

The input to addTrainingDataRow for the second observation similar to the first observation; at the point of error, I get: Browse[2]> as.character(model_matrix_training_data[1, ]) [1] "3" NA "2" "88.6"

[5] "168.8" "64.1" "48.8" "2548"

[9] "4" "130" "3.47" "2.68"

[13] "9" "111" "5000" "21"

[17] "27" "0" "1" "1"

[21] "0" "1" "0" "0"

[25] "0" "0" "0" "0"

[29] "1" "1" "0" "1"

[33] "0" "0" "0" "0"

[37] "0" "0" "0" "0"

[41] "0" "0" "1" "0"

[45] "0" "1" "0" "0"

[49] "0" "0" "9.5100745254521" Browse[2]> as.character(model_matrix_training_data[2, ]) [1] "3" NA "2" "88.6"

[5] "168.8" "64.1" "48.8" "2548"

[9] "4" "130" "3.47" "2.68"

[13] "9" "111" "5000" "21"

[17] "27" "0" "1" "1"

[21] "0" "1" "0" "0"

[25] "0" "0" "0" "0"

[29] "1" "1" "0" "1"

[33] "0" "0" "0" "0"

[37] "0" "0" "0" "0"

[41] "0" "0" "1" "0"

[45] "0" "1" "0" "0"

[49] "0" "0" "9.71111565988867"

The only thing weird I guess is that everything but NAs aren't converted to strings, but everything else is.

— Reply to this email directly or view it on GitHub https://github.com/kapelner/bartMachine/issues/7#issuecomment-130383842.

Adam Kapelner, Ph.D. Assistant Professor of Mathematics Queens College, City University of New York 65-30 Kissena Blvd., Kiely Hall Room 604 Flushing, NY, 11367 M: 516-435-6795 kapelner.com (scholar https://scholar.google.com/citations?user=TzgMmnoAAAAJ|research gate http://www.researchgate.net/profile/Adam_Kapelner2|publons https://publons.com/author/431881/adam-kapelner#profile)

theodds commented 8 years ago

Running with debug_log=TRUE creates two files in my workspace: unnamed.log and unnamed.log.lck, which are both empty files. Nothing new was created in the bartMachine folder.

kapelner commented 8 years ago

Even stranger...

Can you look at https://cran.r-project.org/web/packages/rJava/rJava.pdf page 18 and figure out how to use .jcheck to return the actual exception to you?

My guess is the error is on line 80 of https://github.com/kapelner/bartMachine/blob/master/src/bartMachine/Classifier.java and I wonder why your setup has this.

It is possible your version of Java has something to do with it. Can you print out java -version for me? You may have to downgrade. I believe our jar is created with version 6. Can you check that too by looking at the jar inside of your bart_java.jar (....\R-3.0.2\library\bartMachine\java) look at http://stackoverflow.com/questions/3313532/what-version-of-javac-built-my-jar

On Wed, Aug 12, 2015 at 1:47 PM, theodds notifications@github.com wrote:

Running with debug_log=TRUE creates two files in my workspace: unnamed.log and unname.log.lck, which are both empty files. Nothing new was created in the bartMachine folder.

— Reply to this email directly or view it on GitHub https://github.com/kapelner/bartMachine/issues/7#issuecomment-130387896.

Adam Kapelner, Ph.D. Assistant Professor of Mathematics Queens College, City University of New York 65-30 Kissena Blvd., Kiely Hall Room 604 Flushing, NY, 11367 M: 516-435-6795 kapelner.com (scholar https://scholar.google.com/citations?user=TzgMmnoAAAAJ|research gate http://www.researchgate.net/profile/Adam_Kapelner2|publons https://publons.com/author/431881/adam-kapelner#profile)

theodds commented 8 years ago

Running java -version gives

java version "1.7.0_79"
OpenJDK Runtime Environment (IcedTea 2.5.6) (7u79-2.5.6-0ubuntu1.14.04.1)
OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

Using the method of jikes.thunderbolt in the stackoverflow answer, I got that my jars correspond to "major version: 50", which apparently corresponds to Java 6.

theodds commented 8 years ago

Tried .jcheck(), it didn't seem to do anything. I will probably try to see if I can get this working on another machine. I'll also try compiling the .jar files from source from github again.

kapelner commented 8 years ago

How about this... delete the bartMachine library folder. Then do a "git clone" on the source repository and then "ant" and then do an "R CMD INSTALL bartMachine"

On Wed, Aug 12, 2015 at 2:59 PM, theodds notifications@github.com wrote:

Running java -version gives

java version "1.7.0_79" OpenJDK Runtime Environment (IcedTea 2.5.6) (7u79-2.5.6-0ubuntu1.14.04.1) OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

Using the method of jikes.thunderbolt in the stackoverflow answer, I got that my jars correspond to "major version: 50", which apparently corresponds to Java 6.

— Reply to this email directly or view it on GitHub https://github.com/kapelner/bartMachine/issues/7#issuecomment-130412138.

Adam Kapelner, Ph.D. Assistant Professor of Mathematics Queens College, City University of New York 65-30 Kissena Blvd., Kiely Hall Room 604 Flushing, NY, 11367 M: 516-435-6795 kapelner.com (scholar https://scholar.google.com/citations?user=TzgMmnoAAAAJ|research gate http://www.researchgate.net/profile/Adam_Kapelner2|publons https://publons.com/author/431881/adam-kapelner#profile)

theodds commented 8 years ago

It is working now. What fixed it was uninstalling rJava and instead installing it from sudo apt-get install r-cran-rjava, and then installing bartMachine it from the source. I guess installing rJava using install.packages("rJava") was the issue?

Thanks for your help!

kapelner commented 8 years ago

It is possible you were using an old version of rJava... beats me. Glad it's fixed. And glad this thread is available online for all to see who have a similar problem. Have fun using bartMachine...

On Wed, Aug 12, 2015 at 3:16 PM, theodds notifications@github.com wrote:

It is working now. What fixed it was uninstalling rJava and instead installing it from sudo apt-get install r-cran-rjava, and then installing it from the source. I guess installing rJava using install.packages("rJava") was the issue?

— Reply to this email directly or view it on GitHub https://github.com/kapelner/bartMachine/issues/7#issuecomment-130416216.

Adam Kapelner, Ph.D. Assistant Professor of Mathematics Queens College, City University of New York 65-30 Kissena Blvd., Kiely Hall Room 604 Flushing, NY, 11367 M: 516-435-6795 kapelner.com (scholar https://scholar.google.com/citations?user=TzgMmnoAAAAJ|research gate http://www.researchgate.net/profile/Adam_Kapelner2|publons https://publons.com/author/431881/adam-kapelner#profile)