spgarbet / tangram

Table Grammar package for R
66 stars 3 forks source link

Unable to create big tables #43

Closed kylerove closed 5 years ago

kylerove commented 5 years ago

I was trying to create a large table with ~34 variables (group ~ 1 + 2 + 3 + ... + 34) and consistently would get an error on the 24th variable. I thought it might be a problem with that variable, but even if I removed it, whatever the 24th variable was would cause it to error out:

"Error in self$nexttoken() : Unparseable input starting at postop"

23 variables (most are binary, some categorical) seems like a strange number. Not sure what the issue is.

[edit] This is using the latest code here on github.

spgarbet commented 5 years ago

This is an elusive bug I've been chasing for a good while. I don't think it's the number of variable but the variable name itself containing an underscore. Can you send the formula string you are using?

spgarbet commented 5 years ago

Also, what version are you using? The current CRAN release or from github?

kylerove commented 5 years ago

My formula: `table3 <- tangram(group ~ eras_score[0]

I'm using the latest from Github.

spgarbet commented 5 years ago

Okay, looking at it now.

spgarbet commented 5 years ago

As a workaround rbind() works with tangram objects.

spgarbet commented 5 years ago

I can't reproduce it. I suspect you have a hidden white space character hidden in the formula based on the error output you provided having a break in it. I expanded the definition of white space to include newlines and returns and pushed that up.

Install the latest and let me know what happens. If what I think is true then it will probably fail on not being able to access a variable due to the hidden character.

> library(tangram)
> x <- "group ~ eras_score[0] + eras_teaching_check[0] + preop_carb_check[0] + preop_diet_check[0] + preop_bowel_prep_check[0] + antibiotic_check[0] + dvt_ppx_check[0] + normothermia_check[0] + regional_anesthesia_check[0] + intraop_opioid_check[0] + intraop_fluids_check[0] + intraop_min_invasive_check[0] + intraop_ng_check[0] + intraop_drains_check[0] + postop_diet_check[0] + postop_ivf_check[0] + postop_mobilization_check[0] + postop_excessdrainremoval_check[0] + postop_adjunctive_check[0] + postop_antiemetic_check[0] + preop_carb_vol_norm[1] + preadmit_check[0] + preop_acetaminophen_check[0] + regional_block_check[0] + intraop_opioids[2] + intraop_ketamine_check[0] + intraop_dextamatomadine_check[0] + intraop_lidocaine_check[0] + intraop_anxiolytic_check[0] + intraop_steroids_check[0] + intraop_nsaids_check[0] + intraop_acetaminophen_check[0] + intraop_antiemetic_check[0] + ivf_crystalloid_rate[1] + ivf_colloid_rate[1] + ivf_blood_rate[1] + intraop_ogng_temp_check[0]"
> Parser$new()$run(x)
<ASTTableFormula>
  Inherits from: <ASTBranch>
  Public:
    clone: function (deep = FALSE) 
    distribute: function () 
    format: character
    initialize: function (left, right) 
    left: ASTVariable, ASTNode, R6
    reduce: function (df) 
    right: ASTPlus, ASTBranch, ASTNode, R6
    set_format: function (x) 
    string: function () 
    terms: function () 
    value: NA
kylerove commented 5 years ago

I've looked at the input in my text editor and there are no invisible characters. I even typed out the last variable added to the formula here:

table3 <- tangram(group ~ eras_score[0] + eras_teaching_check[0] + preop_carb_check[0] + preop_diet_check[0] + preop_bowel_prep_check[0] + antibiotic_check[0] + dvt_ppx_check[0] + normothermia_check[0] + regional_anesthesia_check[0] + intraop_opioid_check[0] + intraop_fluids_check[0] + intraop_min_invasive_check[0] + intraop_ng_check[0] + intraop_drains_check[0] + postop_diet_check[0] + postop_ivf_check[0] + postop_mobilization_check[0] + postop_excessdrainremoval_check[0] + postop_adjunctive_check[0] + postop_antiemetic_check[0] + anastomosis, data=allSingle, transform=my_percent) %>% del_col(2) and it errors out on anastomosis, which has no special characters at all. It works fine without anastomosis, but errors when it is added.

spgarbet commented 5 years ago

I found it! when calling as.character on a formula in R it's adding returns when it goes beyond a certain fixed size, i.e. it's not dependent on the width option. I added the \n and \r as ignored white space in the last patch I pushed and it seems to be working for me. I tested the patch and it solves the issue.

spgarbet commented 5 years ago

b265131ca2677f8af91524c455f34c07bde3c315

kylerove commented 5 years ago

Sweet. Yeah, the bug was really weird because it didn't matter which variable but always seemed to error out around the 21st-24th variable. Figures it was related to something with a fixed size. Nice job!

kylerove commented 5 years ago

I confirm. Latest github code works now on my code example!