rmcelreath / rethinking

Statistical Rethinking course and book package
2.1k stars 596 forks source link

character limit in translating to stan with ulam()? #407

Open lorenze3 opened 10 months ago

lorenze3 commented 10 months ago

I really appreciate the work both in educating on bayesian regression in such an entertaining way and in making stan so easy to use from R.

I'm abusing the rethinking package to avoid writing macros to write stan code (because someone already did that, right?). I'd use brms for my problem except I want per-coefficient lower bounds (ie some coefficients to be non-negative, but not all). At any rate, I've got a very long deterministic assignment. Once I add 'too many' terms to the model, ulam() breaks up the assignment into two lines, but not in a way that is syntatically correct.

here's an attempt at a reproducible example:

#!R
d<-tibble(week=1:100,store_id=rep(c(1,2),50), digital=1:100,
          leads=1:100,facebook=1:100,bing=1:100,cable=1:100,
          google=1:100,marketingcloud=1:100,
          mntn=1:100,newspaper=1:100,ooh=1:100,pinterest=1:100,
          radio=1:100,simplifi=1:100,snapchat=1:100,video=1:100,
          tv=1:100,taboola=1:100,tiktok=1:100,
          twitter=1:100,sin1=sin(1:100),cos1=cos(1:100))

fff<-ulam(alist(b_bing ~normal(0,10),
                b_cable ~ normal(0,10),
                b_digital~normal(0,10),
                b_facebook~normal(0,10),
                b_google~normal(0,10),
                b_marketingcloud~normal(0,10),
                b_mntn~normal(0,10),
                b_newspaper~normal(0,10),
                b_ooh~normal(0,10),
                b_pinterest~normal(0,10),
                b_radio~normal(0,10),
                b_simplifi~normal(0,10),
                b_snapchat~normal(0,10),
                b_video~normal(0,10),
                b_tv~normal(0,10),
                b_taboola~normal(0,10),
                b_tiktok~normal(0,10),
                b_twitter~normal(0,10),
                b_sin1~normal(0,10),
                b_cos1~normal(0,10),
                store_int[store_id]~normal(0,10),
                a0 ~normal(0,10),
                b_week~normal(0,100),
                big_model <- b_bing * bing + b_cable * cable + b_digital * digital + 
                  b_facebook * facebook + b_google * google + b_marketingcloud * 
                  marketingcloud + b_mntn * mntn + b_newspaper * newspaper + 
                  b_ooh * ooh + b_pinterest * pinterest + b_radio * radio + 
                  b_simplifi * simplifi + b_snapchat * snapchat + b_video * 
                  video + b_tv * tv + b_taboola * taboola + b_tiktok * tiktok + 
                  b_twitter * twitter + b_sin1 * sin1 + b_cos1 * cos1 + b_week * 
                  week + store_int[store_id] + a0,
                big_sigma~half_normal(100,100),
                leads ~normal(big_model,big_sigma)),
          data=d,sample=F,declare_all_data = F)

If you check the resulting stan code in fff$model, you'll see that big_model[i] is split into two lines, but the first one ends in a +, and the second line does not increment the existing big_model[i]. I'll paste that code in below.

#!stan
data{
    array[100] int store_id;
    array[100] int week;
     vector[100] cos1;
     vector[100] sin1;
    array[100] int twitter;
    array[100] int tiktok;
    array[100] int taboola;
    array[100] int tv;
    array[100] int video;
    array[100] int snapchat;
    array[100] int simplifi;
    array[100] int radio;
    array[100] int pinterest;
    array[100] int ooh;
    array[100] int newspaper;
    array[100] int mntn;
    array[100] int marketingcloud;
    array[100] int google;
    array[100] int facebook;
    array[100] int digital;
    array[100] int cable;
    array[100] int bing;
     vector[100] leads;
}
parameters{
     real b_bing;
     real b_cable;
     real b_digital;
     real b_facebook;
     real b_google;
     real b_marketingcloud;
     real b_mntn;
     real b_newspaper;
     real b_ooh;
     real b_pinterest;
     real b_radio;
     real b_simplifi;
     real b_snapchat;
     real b_video;
     real b_tv;
     real b_taboola;
     real b_tiktok;
     real b_twitter;
     real b_sin1;
     real b_cos1;
     vector[2] store_int;
     real a0;
     real b_week;
     real<lower=0> big_sigma;
}
model{
     vector[100] big_model;
    leads ~ normal( big_model , big_sigma );
    big_sigma ~ normal( 100 , 100 );
    for ( i in 1:100 ) {
        big_model[i] = b_bing * bing[i] + b_cable * cable[i] + b_digital * digital[i] + b_facebook * facebook[i] + b_google * google[i] + b_marketingcloud * marketingcloud[i] + b_mntn * mntn[i] + b_newspaper * newspaper[i] + b_ooh * ooh[i] + b_pinterest * pinterest[i] + b_radio * radio[i] + b_simplifi * simplifi[i] + b_snapchat * snapchat[i] + b_video * video[i] + b_tv * tv[i] + b_taboola * taboola[i] + b_tiktok * tiktok[i] + b_twitter * twitter[i] + b_sin1 * sin1[i] + b_cos1 * cos1[i] + b_week * week[i] + store_int[store_id[i]] + ;
        big_model[i] =     a0;
    }
    b_week ~ normal( 0 , 100 );
    a0 ~ normal( 0 , 10 );
    store_int ~ normal( 0 , 10 );
    b_cos1 ~ normal( 0 , 10 );
    b_sin1 ~ normal( 0 , 10 );
    b_twitter ~ normal( 0 , 10 );
    b_tiktok ~ normal( 0 , 10 );
    b_taboola ~ normal( 0 , 10 );
    b_tv ~ normal( 0 , 10 );
    b_video ~ normal( 0 , 10 );
    b_snapchat ~ normal( 0 , 10 );
    b_simplifi ~ normal( 0 , 10 );
    b_radio ~ normal( 0 , 10 );
    b_pinterest ~ normal( 0 , 10 );
    b_ooh ~ normal( 0 , 10 );
    b_newspaper ~ normal( 0 , 10 );
    b_mntn ~ normal( 0 , 10 );
    b_marketingcloud ~ normal( 0 , 10 );
    b_google ~ normal( 0 , 10 );
    b_facebook ~ normal( 0 , 10 );
    b_digital ~ normal( 0 , 10 );
    b_cable ~ normal( 0 , 10 );
    b_bing ~ normal( 0 , 10 );
}

for my immediate purposes I'm just going to use shorter variable names, but this might be a simple fix? Changing the big_model assignment to

 big_model[i] = b_bing * bing[i] + b_cable * cable[i] + b_digital * digital[i] + b_facebook * facebook[i] + b_google * google[i] + b_marketingcloud * marketingcloud[i] + b_mntn * mntn[i] + b_newspaper * newspaper[i] + b_ooh * ooh[i] + b_pinterest * pinterest[i] + b_radio * radio[i] + b_simplifi * simplifi[i] + b_snapchat * snapchat[i] + b_video * video[i] + b_tv * tv[i] + b_taboola * taboola[i] + b_tiktok * tiktok[i] + b_twitter * twitter[i] + b_sin1 * sin1[i] + b_cos1 * cos1[i] + b_week * week[i] + store_int[store_id[i]]  ;
        big_model[i] =  big_model[i] +   a0;

seems to work (ie the sampling runs instead of generating an error), in case there is some character limit in R or Stan that is hard to work around.

lorenze3 commented 10 months ago

Oops -- I've now realized this seems to be about the number of terms in the assignment and not character length.

but it seems to be possible to work around it by breaking the assignment into two formulae --

alist(
    big_model <- <20 terms like b_varname * varname >,
    big_model<- big_model[i] + <remaining terms like b_varname *varname>

)

although I'm not 100% sure that this code (which does compile and run) results in the same sampling as not having it broken into two pieces

rmcelreath commented 10 months ago

Interesting bug.

To workaround, use multiple symbols:

big_model <- [stuff] big_model2 <- big_model + [more stuff]

lorenze3 commented 10 months ago

Thanks for the response! And, dang it, I shoulda thought of that.