nooreendabbish / Traffic

JSM 2016 GSS Data Challenge
1 stars 2 forks source link

Breslow Day test #22

Open PatrickCoyle opened 8 years ago

PatrickCoyle commented 8 years ago

Wanted to raise a few concerns about the use of BreslowDayTest from the DescTools package:

  1. We are able to calculate the weights using tapply(), but BreslowDayTest() will take our sample size to be the sum of those values. In reality, our sample size is much lower. I think this means that the test is using the incorrect factor "n" in its variance extimation for the chi-square test. This is the problem with doing these tests in a weighted setting as far as I can tell.
  2. Checking ?BreslowDayTest, it looks like we need to specify a null odds ratio to test again. Otherwise, it does a CMH test (presumably OR=1). So we should calculate the marginal odds ratio and specify this in the function.

Let me know what you think!

Patrick

chenchen715 commented 8 years ago

I remember we decided to proceed with running a Breslow Day Test without considering the WEIGHT, assuming previously the weighted model doesn't differ much from the unweighted model.... but ok, I agree that it's not good... I will take a look tonight.

PatrickCoyle commented 8 years ago

Ok I probably skipped past the unweighted part too quickly. Sorry about that. So are the results on the slides from the Breslow-Day test with counts instead of sum of weights? If so, that is good. We can present that while emphasizing that a survey-weighted version of the variance has not been formulated as far as we know. But we would still have to specify the marginal odds ratio when we call the function.

Patrick

On Tuesday, July 19, 2016, Chen Chen notifications@github.com wrote:

I remember we decided to proceed with running a Breslow Day Test without considering the WEIGHT, assuming previously the weighted model doesn't differ much from the unweighted model.... but ok, I agree that it's not good... I will take a look tonight.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nooreendabbish/Traffic/issues/22#issuecomment-233610594, or mute the thread https://github.com/notifications/unsubscribe-auth/ASq5oBD4WUFh6ljfOBYbjfVjDpUpKcjpks5qXLvvgaJpZM4JPUB4 .

Patrick T. Coyle PhD Student, Statistics Fox School of Business and Management Temple University patrick.coyle@temple.edu patricktmc@gmail.com (610) 761-1992

chenchen715 commented 8 years ago

Ahah, I just found out two things!

  1. I was actually running Breslow Day on contingency table adjusted to WEIGHT. I did it by using Dr. Heiberger's weighted mosaic code to obtain the 2_2_2 contingency tables, and then feeding the contingency tables into the function BreslowDayTest().
  2. When we don't specify OR in the function, it runs the test on Mentel-Hanzel estimate of pooled odds ratio. Somehow the function doesn't output the estimated odds ratio. But I pulled the source code, and made it into the output list. Now we can easily calculate the pooled OR and obtain the test result.

I have updated the slides adding in the pooled OR, on overleaf.

PatrickCoyle commented 8 years ago

OK but aren't we using a variance estimator based on the "raw" sum of weights instead of the Horvitz-Thompson estimation? Check out the header by Lumley here:

http://faculty.washington.edu/tlumley/old-survey/example-chisq.html

Here is a page that lays out the Breslow-Day statistic quite nicely:

http://stats.stackexchange.com/questions/212701/correct-equation-for-breslow-day-statistic-in-homogeneity-test-of-odds-ratio

On Tue, Jul 19, 2016 at 6:56 PM, Chen Chen notifications@github.com wrote:

Ahah, I just found out two things!

1.

I was actually running Breslow Day on contingency table adjusted to WEIGHT. I did it by using Dr. Heiberger's weighted mosaic code to obtain the 2_2_2 contingency tables, and then feeding the contingency tables into the function BreslowDayTest(). 2.

When we don't specify OR in the function, it runs the test on Mentel-Hanzel estimate of pooled odds ratio. Somehow the function doesn't output the estimated odds ratio. But I pulled the source code, and made it into the output list. Now we can easily calculate the pooled OR and obtain the test result.

I have updated the slides adding in the pooled OR, on overleaf.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nooreendabbish/Traffic/issues/22#issuecomment-233801323, or mute the thread https://github.com/notifications/unsubscribe-auth/ASq5oHR6G7nrM3f46M7lNvdY15jqPFOHks5qXWQxgaJpZM4JPUB4 .

Patrick T. Coyle PhD Student, Statistics Fox School of Business and Management Temple University patrick.coyle@temple.edu patricktmc@gmail.com (610) 761-1992

PatrickCoyle commented 8 years ago

I think the best we can do is to NOT adjust for weight and just use sample counts, unless fitting the Horrvitz-Thompson estimator in validly seems easy.....

On Tue, Jul 19, 2016 at 9:06 PM, Patrick Coyle tuf74530@temple.edu wrote:

OK but aren't we using a variance estimator based on the "raw" sum of weights instead of the Horvitz-Thompson estimation? Check out the header by Lumley here:

http://faculty.washington.edu/tlumley/old-survey/example-chisq.html

Here is a page that lays out the Breslow-Day statistic quite nicely:

http://stats.stackexchange.com/questions/212701/correct-equation-for-breslow-day-statistic-in-homogeneity-test-of-odds-ratio

On Tue, Jul 19, 2016 at 6:56 PM, Chen Chen notifications@github.com wrote:

Ahah, I just found out two things!

1.

I was actually running Breslow Day on contingency table adjusted to WEIGHT. I did it by using Dr. Heiberger's weighted mosaic code to obtain the 2_2_2 contingency tables, and then feeding the contingency tables into the function BreslowDayTest(). 2.

When we don't specify OR in the function, it runs the test on Mentel-Hanzel estimate of pooled odds ratio. Somehow the function doesn't output the estimated odds ratio. But I pulled the source code, and made it into the output list. Now we can easily calculate the pooled OR and obtain the test result.

I have updated the slides adding in the pooled OR, on overleaf.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nooreendabbish/Traffic/issues/22#issuecomment-233801323, or mute the thread https://github.com/notifications/unsubscribe-auth/ASq5oHR6G7nrM3f46M7lNvdY15jqPFOHks5qXWQxgaJpZM4JPUB4 .

Patrick T. Coyle PhD Student, Statistics Fox School of Business and Management Temple University patrick.coyle@temple.edu patricktmc@gmail.com (610) 761-1992

Patrick T. Coyle PhD Student, Statistics Fox School of Business and Management Temple University patrick.coyle@temple.edu patricktmc@gmail.com (610) 761-1992

PatrickCoyle commented 8 years ago

Other thoughts:

Can we include odds ratio CIs as microplots on the Breslow-Day table? We should do one for the univariate models (with drowsy as response) for the predictors in the "mosaic matrix" too.

Patrick

On Tue, Jul 19, 2016 at 9:07 PM, Patrick Coyle tuf74530@temple.edu wrote:

I think the best we can do is to NOT adjust for weight and just use sample counts, unless fitting the Horrvitz-Thompson estimator in validly seems easy.....

On Tue, Jul 19, 2016 at 9:06 PM, Patrick Coyle tuf74530@temple.edu wrote:

OK but aren't we using a variance estimator based on the "raw" sum of weights instead of the Horvitz-Thompson estimation? Check out the header by Lumley here:

http://faculty.washington.edu/tlumley/old-survey/example-chisq.html

Here is a page that lays out the Breslow-Day statistic quite nicely:

http://stats.stackexchange.com/questions/212701/correct-equation-for-breslow-day-statistic-in-homogeneity-test-of-odds-ratio

On Tue, Jul 19, 2016 at 6:56 PM, Chen Chen notifications@github.com wrote:

Ahah, I just found out two things!

1.

I was actually running Breslow Day on contingency table adjusted to WEIGHT. I did it by using Dr. Heiberger's weighted mosaic code to obtain the 2_2_2 contingency tables, and then feeding the contingency tables into the function BreslowDayTest(). 2.

When we don't specify OR in the function, it runs the test on Mentel-Hanzel estimate of pooled odds ratio. Somehow the function doesn't output the estimated odds ratio. But I pulled the source code, and made it into the output list. Now we can easily calculate the pooled OR and obtain the test result.

I have updated the slides adding in the pooled OR, on overleaf.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nooreendabbish/Traffic/issues/22#issuecomment-233801323, or mute the thread https://github.com/notifications/unsubscribe-auth/ASq5oHR6G7nrM3f46M7lNvdY15jqPFOHks5qXWQxgaJpZM4JPUB4 .

Patrick T. Coyle PhD Student, Statistics Fox School of Business and Management Temple University patrick.coyle@temple.edu patricktmc@gmail.com (610) 761-1992

Patrick T. Coyle PhD Student, Statistics Fox School of Business and Management Temple University patrick.coyle@temple.edu patricktmc@gmail.com (610) 761-1992

Patrick T. Coyle PhD Student, Statistics Fox School of Business and Management Temple University patrick.coyle@temple.edu patricktmc@gmail.com (610) 761-1992

chenchen715 commented 8 years ago

Haha, I can not run away from Horvitz-Thompson, can I? I will take a look, and see if we should pursue further with it.

For microplots, I am not very positive, I still haven't figured out why my laptop couldn't run latex.... but I can give it a shot.

Good luck with the presentation today!!

PatrickCoyle commented 8 years ago

Ok I will try to do the microplots.

I don't understand the pooled OR in your table. Since each pooled set is the set of all truckers, shouldn't the pooled OR be the same for each table? What we want to see is the conditional ORS vs the marginal (pooled) OR.

Also, the totals you listed on the I troduction are weight sums. We did not actually observe that many accidents. We should list the number of reported accidents (number of unique case numbers) and the population estimates (sum of weights for unique case numbers).

Patrick

On Wednesday, July 20, 2016, Chen Chen notifications@github.com wrote:

Haha, I can not run away from Horvitz-Thompson, can I? I will take a look, and see if we should pursue further with it.

For microplots, I am not very positive, I still haven't figured out why my laptop couldn't run latex.... but I can give it a shot.

Good luck with the presentation today!!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nooreendabbish/Traffic/issues/22#issuecomment-233924848, or mute the thread https://github.com/notifications/unsubscribe-auth/ASq5oNvCwIePGHqElmg-ZK2KrQZXw_ZXks5qXggxgaJpZM4JPUB4 .

Patrick T. Coyle PhD Student, Statistics Fox School of Business and Management Temple University patrick.coyle@temple.edu patricktmc@gmail.com (610) 761-1992

chenchen715 commented 8 years ago

Geeze.... I feel ashamed!! Sorry, I should not have rushed through this...

I will look into this.

chenchen715 commented 8 years ago

Just found out that the common practice for Breslow Day Test is to use the "Adjusted" Common Odds Ratio, instead of marginal odds ratio. In R, this "Adjusted" Common OR is calculated by Mental Hansel estimate, which is essentially a weighted average of individual odds ratio, where weights are defined by counts of each stratum. That is why, when using different stratifying effects, this "Pooled OR" estimated by Mental Hansel method are different.

In SAS, the "Adjusted" Common OR is using Logit Estimate, which is weighted of individual log OR. And the Chi-square statistic of Breslow Day Test is calculated measuring the difference between each stratum and this common OR.

In our slides, I put the OR of each stratum, and the marginal OR (NOT weighted average), but the p-values are still generated based on "Adjusted" common OR, because 1) it's common practice, and I read somewhere saying there are reasons why using marginal OR is not good... but I didn't dig into it; 2) the R function wouldn't work if I manually assign OR.... I spent sometime trying to get it to work, but didn't really succeed, I decided not to pursue it further at this point.

Please let me know if it works for you.

chenchen715 commented 8 years ago

Oh, and plus... the p-values in the slides are generated by SAS... which is to use weighted average of log OR to calculate the "Adjusted" Common OR... My reason is.... after some failure experience with this function BreslowDayTest in R.... I just don't trust it anymore... so I turned to SAS.