wbonat / mcglm

Fitting multivariate covariance generalized linear models
GNU General Public License v3.0
21 stars 5 forks source link

Bivariate Poisson GLM regression #9

Closed evanbiederstedt closed 6 years ago

evanbiederstedt commented 7 years ago

Thanks for creating this package and the accompanying documentation.

I'm still a bit confused however. Let's say I wanted to implement a straightforward bivariate Poisson GLM, e.g.

counts1 <- c(18,17,15,20,10,20,25,13,12)
counts2 <- c(12,12,11,24,4,2,15,19,18)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
print(d.AD <- data.frame(treatment, outcome, counts1, counts2))

It's not clear to me with mcglm how one would execute count1, count2 ~ outcome + treatment

wbonat commented 7 years ago

Below one example of a bivariate Poisson GLM using log link function. counts1 <- c(18,17,15,20,10,20,25,13,12) counts2 <- c(12,12,11,24,4,2,15,19,18) outcome <- gl(3,1,9) treatment <- gl(3,3) print(d.AD <- data.frame(treatment, outcome, counts1, counts2))

require(mcglm) fit1 <- mcglm(linear_pred = c(counts1 ~ outcome + treatment, counts2 ~ outcome + treatment), matrix_pred = list(mc_id(d.AD), mc_id(d.AD)), data = d.AD, link = c("log","log"), variance = c("tweedie","tweedie"), power_fixed = c(TRUE, TRUE), control_algorithm = list(tunning = 0.5)) summary(fit1)

Checking algorithm convergence

plot(fit1, type = "algorithm") # OK

wbonat commented 7 years ago

Dear,

Below an example of bivariate Poisson GLM model. Note that, in fact it is a quasi-Poisson GLM model, because the dispersion parameters are not fixed at 1.

Please, let me know if you have more questions.

All the best!

counts1 <- c(18,17,15,20,10,20,25,13,12) counts2 <- c(12,12,11,24,4,2,15,19,18) outcome <- gl(3,1,9) treatment <- gl(3,3) print(d.AD <- data.frame(treatment, outcome, counts1, counts2))

require(mcglm) fit1 <- mcglm(linear_pred = c(counts1 ~ outcome + treatment, counts2 ~ outcome + treatment), matrix_pred = list(mc_id(d.AD), mc_id(d.AD)), data = d.AD, link = c("log","log"), variance = c("tweedie","tweedie"), power_fixed = c(TRUE, TRUE), control_algorithm = list(tunning = 0.5)) summary(fit1)

Checking algorithm convergence

plot(fit1, type = "algorithm") # OK

2017-07-03 15:59 GMT-03:00 evanbiederstedt notifications@github.com:

Thanks for creating this package and the accompanying documentation.

I'm still a bit confused however. Let's say I wanted to implement a straightforward bivariate Poisson GLM, e.g.

counts1 <- c(18,17,15,20,10,20,25,13,12) counts2 <- c(12,12,11,24,4,2,15,19,18) outcome <- gl(3,1,9) treatment <- gl(3,3) print(d.AD <- data.frame(treatment, outcome, counts1, counts2))

It's not clear to me with mcglm how one would execute count1, count2 ~ outcome + treatment

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/wbonat/mcglm/issues/9, or mute the thread https://github.com/notifications/unsubscribe-auth/AG_2ATISjbDcDzpAxANw-S8DsQ8YhJqAks5sKTn_gaJpZM4OMmZl .

-- Wagner Hugo Bonat

Department of Mathematics and Computer Science (IMADA) University of Southern Denmark (SDU) and Laboratório de Estatística e Geoinformação (LEG) Universidade Federal do Paraná (UFPR)

wbonat commented 7 years ago

The below code shows to you how to get the covariance and correlation matrices from the mcglm output. Note that, the models are parametrized in terms of dispersion parameters (marginal variances) and correlations.

require(mcglm) data(ahs, package="mcglm") form1 <- Ndoc ~ income + age form2 <- Nndoc ~ income + age

Z0 <- mc_id(ahs) fit.ahs <- mcglm(linear_pred = c(form1, form2), matrix_pred = list(Z0, Z0), link = c("log","log"), variance = c("poisson_tweedie","poisson_tweedie"), data = ahs) summary(fit.ahs)

correlation <- fit.ahs$Covariance[1] cov1 <- fit.ahs$Covariance[2] cov2 <- fit.ahs$Covariance[3]

COVARIANCE <- matrix(c(cov1, correlationsqrt(cov1cov2), correlationsqrt(cov1cov2), cov2), ncol = 2, nrow = 2) cov2cor(COVARIANCE)

evanbiederstedt commented 7 years ago

@wbonat

Thank you for the quick and helpful response. Responding to this example:

counts1 <- c(18,17,15,20,10,20,25,13,12)
counts2 <- c(12,12,11,24,4,2,15,19,18)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
print(d.AD <- data.frame(treatment, outcome, counts1, counts2))

require(mcglm)
fit1 <- mcglm(linear_pred = c(counts1 ~ outcome + treatment, counts2 ~
outcome + treatment),
              matrix_pred = list(mc_id(d.AD), mc_id(d.AD)), data = d.AD,
              link = c("log","log"), variance = c("tweedie","tweedie"),
              power_fixed = c(TRUE, TRUE),
              control_algorithm = list(tunning = 0.5))
summary(fit1)

# Checking algorithm convergence
plot(fit1, type = "algorithm") # OK

Two questions:

(1) The above approach for bivariate Poisson with my data works well. Howeve, when I run plot(fit1, type = "algorithm") on my dataset, I receive the following error:

 Error in object$IterationRegression[1:c(n_iter + 5), ] :                                                                            
   subscript out of bounds     

What does this mean, and how could I rectify this?

(2) Perhaps this is in the documentation, but could you possibly explain the final lines in summary(fit1)?, i.e.

 Algorithm: chaser                                                                                                                   
 Correction: TRUE                                                                                                                    
 Number iterations: 20>

Thank you for the help

wbonat commented 7 years ago

1) It means you reached the maximum number of iterations. To fix it uses

control_algorithm = list(tunning = 0.5, max_iter = 100)

A warnings message for this situation was included in the 0.4.0 version available on github.

2) It saws that the chaser algorithm was used, with bias correction term and that the algorithm stopped at the 20th iteration.

All the best!

2017-07-03 20:54 GMT-03:00 evanbiederstedt notifications@github.com:

@wbonat https://github.com/wbonat

Thank you for the quick and helpful response. Responding to this example:

counts1 <- c(18,17,15,20,10,20,25,13,12) counts2 <- c(12,12,11,24,4,2,15,19,18) outcome <- gl(3,1,9) treatment <- gl(3,3) print(d.AD <- data.frame(treatment, outcome, counts1, counts2))

require(mcglm) fit1 <- mcglm(linear_pred = c(counts1 ~ outcome + treatment, counts2 ~ outcome + treatment), matrix_pred = list(mc_id(d.AD), mc_id(d.AD)), data = d.AD, link = c("log","log"), variance = c("tweedie","tweedie"), power_fixed = c(TRUE, TRUE), control_algorithm = list(tunning = 0.5)) summary(fit1)

Checking algorithm convergence

plot(fit1, type = "algorithm") # OK

Two questions:

(1) The above approach for bivariate Poisson with my data works well. Howeve, when I run plot(fit1, type = "algorithm") on my dataset, I receive the following error:

Error in object$IterationRegression[1:c(n_iter + 5), ] : subscript out of bounds

What does this mean, and how could I rectify this?

(2) Perhaps this is in the documentation, but could you possibly explain the final lines in summary(fit1)?, i.e.

Algorithm: chaser Correction: TRUE Number iterations: 20>

Thank you for the help

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/wbonat/mcglm/issues/9#issuecomment-312751306, or mute the thread https://github.com/notifications/unsubscribe-auth/AG_2ARIjzvAj_wGS2uevbVlnCl0Yk7zCks5sKX9PgaJpZM4OMmZl .

-- Wagner Hugo Bonat

Department of Mathematics and Computer Science (IMADA) University of Southern Denmark (SDU) and Laboratório de Estatística e Geoinformação (LEG) Universidade Federal do Paraná (UFPR)

evanbiederstedt commented 7 years ago

@wbonat

control_algorithm = list(tunning = 0.5, max_iter = 100) solves the previous error. Is there a reason you chose max_iter=100 over another number?

One last question I haven't been able to understand from the documentation:

After outputting the plot, I see the Regression, Covariance, Quasi-score Regression and Quasiscore Covariance, each plot with several colored lines.

How do I find out what each line corresponds to? See the attached image

plot-6

Thank you again

wbonat commented 7 years ago

1) No. This argument specifies the maximum number of iterations for the algorithm. I thought 100 should be enough for almost all cases.

2) You do not need to know what exactly each line represents. The idea of the plots is just to show if the algorithm converged. If the algorithm converged you expect that both quasi-score regression and quasi-score covariance go to zero, exactly as you see in your plots.

If your response variable is counting, you have an extreme large over-dispersion. The dispersion parameters of your quasi-Poisson model are extremely large (~ 800). Note that, in the Poisson case we expect dispersion parameters close to one. Probably, you should consider to fit a Poisson-Tweedie model. If you have covariates, you can also consider to fit the Poisson-Tweedie power parameter.

Please, let me know if you need further assistance and thanks for your interest in the mcglm package.

All the best!

2017-07-03 21:18 GMT-03:00 evanbiederstedt notifications@github.com:

@wbonat https://github.com/wbonat

control_algorithm = list(tunning = 0.5, max_iter = 100) solves the previous error. Is there a reason you chose max_iter=100 over another number?

One last question I haven't been able to understand from the documentation:

After outputting the plot, I see the Regression, Covariance, Quasi-score Regression and Quasiscore Covariance, each plot with several colored lines.

How do I find out what each line corresponds to? See the attached image

[image: plot-6] https://user-images.githubusercontent.com/6968193/27810789-aee5c95e-602c-11e7-9a91-646fe4defa42.png

Thank you again

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/wbonat/mcglm/issues/9#issuecomment-312753142, or mute the thread https://github.com/notifications/unsubscribe-auth/AG_2AWZVe4BAvgzIkguflOz5OyW9Txbmks5sKYS5gaJpZM4OMmZl .

-- Wagner Hugo Bonat

Department of Mathematics and Computer Science (IMADA) University of Southern Denmark (SDU) and Laboratório de Estatística e Geoinformação (LEG) Universidade Federal do Paraná (UFPR)

evanbiederstedt commented 7 years ago

RE: (2)

The dispersion parameters of your quasi-Poisson model are extremely large (~ 800).

Based on the plots above, how does one see the over-dispersion is ~800? I suspect you're eye-balling the Quasi-score Covariance.

When trying the Poisson-Tweedie model as follows, I seem to run into a Cholesky decomposition error:

fit1 <- mcglm(linear_pred = c(counts1 ~ outcome + treatment, counts2 ~
outcome + treatment),                                                                                                                                
             matrix_pred = list(mc_id(df), mc_id(df)), data = df,                                                                                                                                                                     
             link = c("log","log"), variance = c("poisson_tweedie","poisson_tweedie"),                                                                                                                                                
             power_fixed = c(TRUE, TRUE),                                                                                                                                                                                             
             control_algorithm = list("correct" = TRUE,                                                                                                                                                                               
                                      "verbose" = TRUE,                                                                                                                                                                               
                                      "tol" = 1e-5,                                                                                                                                                                                   
                                      "max_iter" = 100,                                                                                                                                                                               
                                      "method" = "chaser",                                                                                                                                                                            
                                      "tunning" = 1))

Here is the output error:

 + . + Automatic initial values selected.                                                                                                                                                                                             
 [1]  0.4991 -1.5002 -1.5005                                                                                                                                                                                                          
 [1]  5.9894 -0.0002 -0.0001                                                                                                                                                                                                          
 Error in .local(x, ...) :                                                                                                                                                                                                            
   internal_chm_factor: Cholesky factorization failed 
wbonat commented 7 years ago

Yes. It is because your dispersion parameter estimates are quite close to zero. Probably, it is due to bad initial values. You can use the tunning = 0.5 to control the step length and keep the estimates in the proper parameter space. if it does not work decrease the tuning to 0.25 or less. It should work.

2017-07-03 22:35 GMT-03:00 evanbiederstedt notifications@github.com:

RE: (2)

When trying the Poisson-Tweedie model as follows, I seem to run into a Cholesky decomposition error:

fit1 <- mcglm(linear_pred = c(counts1 ~ outcome + treatment, counts2 ~ outcome + treatment), matrix_pred = list(mc_id(df), mc_id(df)), data = df, link = c("log","log"), variance = c("poisson_tweedie","poisson_tweedie"), power_fixed = c(TRUE, TRUE), control_algorithm = list("correct" = TRUE, "verbose" = TRUE, "tol" = 1e-5, "max_iter" = 100, "method" = "chaser", "tunning" = 1))

Here is the output error:

  • . + Automatic initial values selected. [1] 0.4991 -1.5002 -1.5005 [1] 5.9894 -0.0002 -0.0001 Error in .local(x, ...) : internal_chm_factor: Cholesky factorization failed

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/wbonat/mcglm/issues/9#issuecomment-312759959, or mute the thread https://github.com/notifications/unsubscribe-auth/AG_2AUacbP-c5GTdiZ41ksv_4jWHdxlRks5sKZbZgaJpZM4OMmZl .

-- Wagner Hugo Bonat

Department of Mathematics and Computer Science (IMADA) University of Southern Denmark (SDU) and Laboratório de Estatística e Geoinformação (LEG) Universidade Federal do Paraná (UFPR)

evanbiederstedt commented 7 years ago

You can use the tunning = 0.5 to control the step length and keep the estimates in the proper parameter space.

A tuning parameter of tunning = 0.5 appears to work. Thank you.

I am still a bit confused how you read from the plot that the overdispersion was around ~800.

Using the Poisson-Tweedie model as recommended, I do not see much of a difference:

fit1 <- mcglm(linear_pred = c(counts1 ~ outcome + treatment, counts2 ~
outcome + treatment),                                                                                                                                
             matrix_pred = list(mc_id(df), mc_id(df)), data = df,                                                                                                                                                                     
             link = c("log","log"), variance = c("poisson_tweedie","poisson_tweedie"),                                                                                                                                                
             power_fixed = c(TRUE, TRUE),                                                                                                                                                                                             
             control_algorithm = list("correct" = TRUE,                                                                                                                                                                               
                                      "verbose" = TRUE,                                                                                                                                                                               
                                      "tol" = 1e-5,                                                                                                                                                                                   
                                      "max_iter" = 100,                                                                                                                                                                               
                                      "method" = "chaser",                                                                                                                                                                            
                                      "tunning" = 0.5))

Here is the plot I see: plot-7

Are there other strategies for fitting a better model?

wbonat commented 7 years ago

Well, the next step is to estimate the power parameter (power_fixed = c(FALSE, FALSE)). It should provide the best model for your data.

I just look at the covariance values (y-axis in the second plot, first row of your plot).

fit1 <- mcglm(linear_pred = c(counts1 ~ outcome + treatment, counts2 ~ outcome + treatment), matrix_pred = list(mc_id(df), mc_id(df)), data = df, link = c("log","log"), variance = c("poisson_tweedie","poisson_tweedie"), power_fixed = c(FALSE, FALSE), control_algorithm = list("correct" = TRUE, "verbose" = TRUE, "tol" = 1e-5, "max_iter" = 100, "method" = "chaser", "tunning" = 0.5))

2017-07-04 18:21 GMT-03:00 evanbiederstedt notifications@github.com:

You can use the tunning = 0.5 to control the step length and keep the estimates in the proper parameter space.

A tuning parameter of tunning = 0.5 appears to work. Thank you.

I am still a bit confused how you read from the plot that the overdispersion was around ~800.

Using the Poisson-Tweedie model as recommended, I do not see much of a difference:

fit1 <- mcglm(linear_pred = c(counts1 ~ outcome + treatment, counts2 ~ outcome + treatment), matrix_pred = list(mc_id(df), mc_id(df)), data = df, link = c("log","log"), variance = c("poisson_tweedie","poisson_tweedie"), power_fixed = c(TRUE, TRUE), control_algorithm = list("correct" = TRUE, "verbose" = TRUE, "tol" = 1e-5, "max_iter" = 100, "method" = "chaser", "tunning" = 0.5))

Here is the plot I see: [image: plot-7] https://user-images.githubusercontent.com/6968193/27842998-386428fa-60dd-11e7-95f1-e1102c1c045c.png

Are there other strategies for fitting a better model?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/wbonat/mcglm/issues/9#issuecomment-312956933, or mute the thread https://github.com/notifications/unsubscribe-auth/AG_2AR-_dp31LjUB_Aof0O4RerGaiQYIks5sKqzfgaJpZM4OMmZl .

-- Wagner Hugo Bonat

Department of Mathematics and Computer Science (IMADA) University of Southern Denmark (SDU) and Laboratório de Estatística e Geoinformação (LEG) Universidade Federal do Paraná (UFPR)

evanbiederstedt commented 7 years ago

I'm running into a few issues. When I run the above,

fit1 <- mcglm(linear_pred = c(counts1 ~ outcome + treatment, counts2 ~
outcome + treatment),
             matrix_pred = list(mc_id(df), mc_id(df)), data = df,
             link = c("log","log"), variance =
c("poisson_tweedie","poisson_tweedie"),
             power_fixed = c(FALSE, FALSE),
             control_algorithm = list("correct" = TRUE,
                                      "verbose" = TRUE,
                                      "tol" = 1e-5,
                                      "max_iter" = 100,
                                      "method" = "chaser",
                                      "tunning" = 0.5))

I get the following error:

 Error in solve.default(cov_temp$Sensitivity, cov_temp$Score) :                                                                                                        
   system is computationally singular: reciprocal condition number = 3.01365e-45 

So, I have tried lowering the tuning parameter, e.g. tunning=0.05. However, I cannot get the model to converge with max_iter=100. When executing

> ppng(plot(fit1, type = "algorithm"))

I get the previously discussed error:

 Error in object$IterationRegression[1:c(n_iter + 5), ] :                                                                                                              
   subscript out of bounds  

At a certain point, simultaneoulsy increasing max_iter and decreasing tunning results in error, e.g.

fit1 <- mcglm(linear_pred = c(counts1 ~ outcome + treatment, counts2 ~
outcome + treatment),
             matrix_pred = list(mc_id(df), mc_id(df)), data = df,
             link = c("log","log"), variance =
c("poisson_tweedie","poisson_tweedie"),
             power_fixed = c(FALSE, FALSE),
             control_algorithm = list("correct" = TRUE,
                                      "verbose" = TRUE,
                                      "tol" = 1e-5,
                                      "max_iter" = 5000,
                                      "method" = "chaser",
                                      "tunning" = 0.005))

allow output the error

Error in solve.default(cov_temp$Sensitivity, cov_temp$Score) :                                                                                                        
   system is computationally singular: reciprocal condition number = 2.09962e-16  

Advice?

wbonat commented 7 years ago

For estimating the power parameter, you need significant covariates in your model. Do you have significant covariates? In general, you also need enough variation in the mean structure. It is not easy to define what "enough variation" mean. So, if you have only categorical covariates or covariates that do not vary or are not significant, you have no information for estimating the power parameter.

In general estimating the power parameter is not easy. I recommend two strategies.

1) Fit univariate models fixing the power parameter (by default it is fixed at 1) and see if you have significant covariates. Then try to estimate the power parameter. If you can estimate use it as initial values for the multivariate model.

2) Fitting the models using different fixed power parameter. For example, a grid from 1 to 3 with 50 values. Then, for each fitted model computed the pseudo-likelihood using the function plogLik(). Plot the values and use it for estimating the power parameter in a profile pseudo-likelihood style.

All the best!

2017-07-04 20:15 GMT-03:00 evanbiederstedt notifications@github.com:

I'm running into a few issues. When I run the above,

fit1 <- mcglm(linear_pred = c(counts1 ~ outcome + treatment, counts2 ~ outcome + treatment), matrix_pred = list(mc_id(df), mc_id(df)), data = df, link = c("log","log"), variance = c("poisson_tweedie","poisson_tweedie"), power_fixed = c(FALSE, FALSE), control_algorithm = list("correct" = TRUE, "verbose" = TRUE, "tol" = 1e-5, "max_iter" = 100, "method" = "chaser", "tunning" = 0.5))

I get the following error:

Error in solve.default(cov_temp$Sensitivity, cov_temp$Score) : system is computationally singular: reciprocal condition number = 3.01365e-45

So, I have tried lowering the tuning parameter, e.g. tunning=0.05. However, I cannot get the model to converge with max_iter=100. When executing

ppng(plot(fit1, type = "algorithm"))

I get the previously discussed error:

Error in object$IterationRegression[1:c(n_iter + 5), ] : subscript out of bounds

At a certain point, simultaneoulsy increasing max_iter and decreasing tunning results in error, e.g.

fit1 <- mcglm(linear_pred = c(counts1 ~ outcome + treatment, counts2 ~ outcome + treatment), matrix_pred = list(mc_id(df), mc_id(df)), data = df, link = c("log","log"), variance = c("poisson_tweedie","poisson_tweedie"), power_fixed = c(FALSE, FALSE), control_algorithm = list("correct" = TRUE, "verbose" = TRUE, "tol" = 1e-5, "max_iter" = 5000, "method" = "chaser", "tunning" = 0.005))

allow output the error

Error in solve.default(cov_temp$Sensitivity, cov_temp$Score) : system is computationally singular: reciprocal condition number = 2.09962e-16

Advice?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/wbonat/mcglm/issues/9#issuecomment-312965615, or mute the thread https://github.com/notifications/unsubscribe-auth/AG_2AVAr9HyVGLn81cikMu0LF-5Q4DLLks5sKseSgaJpZM4OMmZl .

-- Wagner Hugo Bonat

Department of Mathematics and Computer Science (IMADA) University of Southern Denmark (SDU) and Laboratório de Estatística e Geoinformação (LEG) Universidade Federal do Paraná (UFPR)

evanbiederstedt commented 7 years ago

For estimating the power parameter, you need significant covariates in your model. Do you have significant covariates? In general, you also need enough variation in the mean structure. It is not easy to define what "enough variation" mean. So, if you have only categorical covariates or covariates that do not vary or are not significant, you have no information for estimating the power parameter.

Here's what I see, when fixing the power parameter:

 > summary(fit1)                                                                                                                                                               
 Call: counts1 ~ outcome + treatment                                                                                                                                      

 Link function: log                                                                                                                                                            
 Variance function: poisson_tweedie                                                                                                                                            
 Covariance function: identity                                                                                                                                                 
 Regression:                                                                                                                                                                   
                Estimates    Std.error    Z value                                                                                                                              
 (Intercept) 1.139754e+01 9.913378e-02 114.971255                                                                                                                              
 outcome    2.054192e-05 7.119400e-06   2.885344                                                                                                                            
 treatment  9.435812e-07 2.228062e-07   4.234986                                                                                                                              

 Dispersion:                                                                                                                                                                   
   Estimates Std.error  Z value                                                                                                                                                
 1   8227.67  3704.281 2.221124                                                                                                                                                

 Call: counts2 ~ outcome + treatment                                                                                                                                    

 Link function: log                                                                                                                                                            
 Variance function: poisson_tweedie                                                                                                                                            
 Covariance function: identity                                                                                                                                                 
 Regression:                                                                                                                                                                   
                Estimates    Std.error    Z value                                                                                                                              
 (Intercept) 1.133793e+01 1.005005e-01 112.814654                                                                                                                              
 outcome   2.012961e-05 7.219601e-06   2.788189                                                                                                                              
 treatment 9.611313e-07 2.257268e-07   4.257940                                                                                                                              

 Dispersion:                                                                                                                                                                   
   Estimates Std.error  Z value                                                                                                                                                
 1  7961.685  3539.354 2.249474                                                                                                                                                

 Correlation matrix:                                                                                                                                                           
   Parameters Estimates Std.error Z value                                                                                                                                      
 1      rho12 0.9970412 0.2409272 4.13835                                                                                                                                      

 Algorithm: chaser                                                                                                                                                             
 Correction: TRUE 

RE: your suggestions

1) Fit univariate models fixing the power parameter (by default it is fixed at 1) and see if you have significant covariates. Then try to estimate the power parameter. If you can estimate use it as initial values for the multivariate model.

Fitting a univariate model like the above appears to give the same covariate values (roughly). I'm confused: how would you use this to estimate the power parameter (if they were significant).

2) Fitting the models using different fixed power parameter. For example, a grid from 1 to 3 with 50 values. Then, for each fitted model computed the pseudo-likelihood using the function plogLik(). Plot the values and use it for estimating the power parameter in a profile pseudo-likelihood style.

Do you have an example of this procedure somewhere?

wbonat commented 7 years ago

Are you sure that your data are correct? It is quite strange results. You have very large counts and perfectly linearly correlated (the correlation between responses is 0.9970). If everything is correct with your data, you do not need a bivariate model, because one response is just a kind of "copy" of the other one. In this case, you do not need a model for counts because very large counts should converge to the Gaussian distribution. Thus, your analysis can be done using a simple linear regression model. This very large correlation is extremely hard for the fitting algorithm, because the estimate is very close to the corner of the parameter space. Could you show me some exploratory analysis? Perhaps, a dispersion diagram and some boxplots.

Note that, your regression coefficients are really small, but still significant.

Can you send me a summary of your data?

2017-07-04 20:49 GMT-03:00 evanbiederstedt notifications@github.com:

For estimating the power parameter, you need significant covariates in your model. Do you have significant covariates? In general, you also need enough variation in the mean structure. It is not easy to define what "enough variation" mean. So, if you have only categorical covariates or covariates that do not vary or are not significant, you have no information for estimating the power parameter.

Here's what I see, when fixing the power parameter:

summary(fit1) Call: counts1 ~ outcome + treatment

Link function: log Variance function: poisson_tweedie Covariance function: identity Regression: Estimates Std.error Z value (Intercept) 1.139754e+01 9.913378e-02 114.971255 outcome 2.054192e-05 7.119400e-06 2.885344 treatment 9.435812e-07 2.228062e-07 4.234986

Dispersion: Estimates Std.error Z value 1 8227.67 3704.281 2.221124

Call: counts2 ~ outcome + treatment

Link function: log Variance function: poisson_tweedie Covariance function: identity Regression: Estimates Std.error Z value (Intercept) 1.133793e+01 1.005005e-01 112.814654 outcome 2.012961e-05 7.219601e-06 2.788189 treatment 9.611313e-07 2.257268e-07 4.257940

Dispersion: Estimates Std.error Z value 1 7961.685 3539.354 2.249474

Correlation matrix: Parameters Estimates Std.error Z value 1 rho12 0.9970412 0.2409272 4.13835

Algorithm: chaser Correction: TRUE

I will try your suggetions

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/wbonat/mcglm/issues/9#issuecomment-312967708, or mute the thread https://github.com/notifications/unsubscribe-auth/AG_2AXIEwNPQJvW1dMN-m4DY9rJOpTYHks5sKs-ngaJpZM4OMmZl .

-- Wagner Hugo Bonat

Department of Mathematics and Computer Science (IMADA) University of Southern Denmark (SDU) and Laboratório de Estatística e Geoinformação (LEG) Universidade Federal do Paraná (UFPR)

evanbiederstedt commented 7 years ago

Let's move this conversation to e-mail.

I think this issue is now closed. Thank you for the help