rpietro / NSQIPageComplications

Analysis of surgical complications using the NSQIP data set
1 stars 1 forks source link

comparison of categorical variables before after imputation #29

Open mworni opened 12 years ago

mworni commented 12 years ago

Ricardo - how can I compare statistically femalesex before and after imputation? I would like to show that the distribution of male/female stays the same.

I did:

CrossTable(femalesex, femalesex_old, chisq=TRUE)

   | femalesex 
femalesex_old 0 1 Row Total
0 49894 0 49894
25532.759 25140.241
1.000 0.000 0.496
1.000 0.000
0.496 0.000
-------------- ----------- ----------- -----------
1 0 50673 50673
25140.241 24753.759
0.000 1.000 0.504
0.000 1.000
0.000 0.504
-------------- ----------- ----------- -----------
Column Total 49894 50673 100567
0.496 0.504
-------------- ----------- ----------- -----------

Statistics for All Table Factors

Pearson's Chi-squared test

Chi^2 = 100567 d.f. = 1 p = 0

Pearson's Chi-squared test with Yates' continuity correction

Chi^2 = 100563 d.f. = 1 p = 0

and I also did

CrossTable(femalesex_old, femalesex, missing.include=TRUE, chisq=TRUE)

Total Observations in Table: 100829

          | femalesex 
femalesex_old 0 1 Row Total
0 49894 0 49894
25550.928 25146.172
1.000 0.000 0.495
0.998 0.000
0.495 0.000
-------------- ----------- ----------- -----------
1 0 50673 50673
25134.218 24736.063
0.000 1.000 0.503
0.000 0.997
0.000 0.503
-------------- ----------- ----------- -----------
NA 118 144 262
1.100 1.082
0.450 0.550 0.003
0.002 0.003
0.001 0.001
-------------- ----------- ----------- -----------
Column Total 50012 50817 100829
0.496 0.504
-------------- ----------- ----------- -----------

Statistics for All Table Factors

Pearson's Chi-squared test

Chi^2 = 100569.6 d.f. = 2 p = 0

But actually I think both is wrong... I think I have a knot in my brain somewhere...

rpietro commented 12 years ago

I am sorry, I thought i had answered this by email, but apparently it didn't show up. I think that what you are doing is correct. why do you think it is wrong?

mworni commented 12 years ago

My concern is that those variables are mutually exclusive. What does the chisquare test tell me? It is a highly significant result but I think it does not test what I want to know. I would like to see that the distribution of male/females in the original dataset is the same as in the imputed dataset - but right now I think I test that females in femalesex_old are exclusively females if they are females in the imputed dataset and vice versa.

I struggle a little with explaining but I actually think that this test is missleading.

rpietro commented 12 years ago

i don't think they are non-overlapping, but i guess i understand what you are saying now. what I would do is to include NA as a category in the counts. to do that, just include the table option exclude=NULL . in that way you will be comparing the proportions before and after imputation

if that doesn't work just shoot me the code and i will fix it

On Tue, Jul 31, 2012 at 9:31 AM, mworni < reply@reply.github.com

wrote:

My concern is that those variables are mutually exclusive. What does the chisquare test tell me? It is a highly significant result but I think it does not test what I want to know. I would like to see that the distribution of male/females in the original dataset is the same as in the imputed dataset - but right now I think I test that females in femalesex_old are exclusively females if they are females in the imputed dataset and vice versa.

I struggle a little with explaining but I actually think that this test is missleading.


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7398202

mworni commented 12 years ago

Ricardo - I did not think that this is working as femalesex does not have any missing values anymore - I did use the following command (file: NSQIP John Scarborough age complications.R*, starting line 837)

CrossTable(femalesex, femalesex_old, missing.include=TRUE, chisq=TRUE)

CrossTable(femalesex, femalesex_old, missing.include=TRUE, chisq=TRUE)

Cell Contents ------------------------- N Chi-square contribution N / Row Total N / Col Total N / Table Total

Total Observations in Table: 100829

         | femalesex_old
femalesex 0 1 NA Row Total
0 49894 0 118 50012
25550.928 25134.218 1.100
0.998 0.000 0.002 0.496
1.000 0.000 0.450
0.495 0.000 0.001
------------- ----------- ----------- ----------- -----------
1 0 50673 144 50817
25146.172 24736.063 1.082
0.000 0.997 0.003 0.504
0.000 1.000 0.550
0.000 0.503 0.001
------------- ----------- ----------- ----------- -----------
Column Total 49894 50673 262 100829
0.495 0.503 0.003
------------- ----------- ----------- ----------- -----------

Statistics for All Table Factors

Pearson's Chi-squared test

Chi^2 = 100569.6 d.f. = 2 p = 0

On Tue, Jul 31, 2012 at 3:46 PM, Ricardo Pietrobon < reply@reply.github.com

wrote:

i don't think they are non-overlapping, but i guess i understand what you are saying now. what I would do is to include NA as a category in the counts. to do that, just include the table option exclude=NULL . in that way you will be comparing the proportions before and after imputation

if that doesn't work just shoot me the code and i will fix it

On Tue, Jul 31, 2012 at 9:31 AM, mworni < reply@reply.github.com

wrote:

My concern is that those variables are mutually exclusive. What does the chisquare test tell me? It is a highly significant result but I think it does not test what I want to know. I would like to see that the distribution of male/females in the original dataset is the same as in the imputed dataset - but right now I think I test that females in femalesex_old are exclusively females if they are females in the imputed dataset and vice versa.

I struggle a little with explaining but I actually think that this test is missleading.


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7398202


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7398592

Mathias Worni, MD, MHS Consulting Associate in Surgery Department of Surgery Duke University Medical Center

rpietro commented 12 years ago

why do you think it is not working?

On Thu, Aug 2, 2012 at 2:31 PM, mworni < reply@reply.github.com

wrote:

Ricardo - I did not think that this is working as femalesex does not have any missing values anymore - I did use the following command (file: NSQIP John Scarborough age complications.R*, starting line 837)

CrossTable(femalesex, femalesex_old, missing.include=TRUE, chisq=TRUE)

CrossTable(femalesex, femalesex_old, missing.include=TRUE, chisq=TRUE)

Cell Contents ------------------------- N Chi-square contribution N / Row Total N / Col Total N / Table Total

Total Observations in Table: 100829

         | femalesex_old
femalesex 0 1 NA Row Total
0 49894 0 118 50012
25550.928 25134.218 1.100
0.998 0.000 0.002 0.496
1.000 0.000 0.450
0.495 0.000 0.001
------------- ----------- ----------- ----------- -----------
1 0 50673 144 50817
25146.172 24736.063 1.082
0.000 0.997 0.003 0.504
0.000 1.000 0.550
0.000 0.503 0.001
------------- ----------- ----------- ----------- -----------
Column Total 49894 50673 262 100829
0.495 0.503 0.003
------------- ----------- ----------- ----------- -----------

Statistics for All Table Factors

Pearson's Chi-squared test

Chi^2 = 100569.6 d.f. = 2 p = 0

On Tue, Jul 31, 2012 at 3:46 PM, Ricardo Pietrobon < reply@reply.github.com

wrote:

i don't think they are non-overlapping, but i guess i understand what you are saying now. what I would do is to include NA as a category in the counts. to do that, just include the table option exclude=NULL . in that way you will be comparing the proportions before and after imputation

if that doesn't work just shoot me the code and i will fix it

On Tue, Jul 31, 2012 at 9:31 AM, mworni < reply@reply.github.com

wrote:

My concern is that those variables are mutually exclusive. What does the chisquare test tell me? It is a highly significant result but I think it does not test what I want to know. I would like to see that the distribution of male/females in the original dataset is the same as in the imputed dataset - but right now I think I test that females in femalesex_old are exclusively females if they are females in the imputed dataset and vice versa.

I struggle a little with explaining but I actually think that this test is missleading.


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7398202


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7398592

Mathias Worni, MD, MHS Consulting Associate in Surgery Department of Surgery Duke University Medical Center


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7464058

mworni commented 12 years ago

actually what I would expect is a non-significant result as I hope the distribution between the original and the imputed dataset are similar. Here I get a chisqr value of 100,000 - this is not really what I think the result should be. The test performed does tell me something else - if I would say that femalesex_old is a test and femalesex is the disease, then this would be highly significant as if the test is positive, the disease would be present and vice versa - but I hoped to see the total opposite.

On Thu, Aug 2, 2012 at 9:02 PM, Ricardo Pietrobon < reply@reply.github.com

wrote:

why do you think it is not working?

On Thu, Aug 2, 2012 at 2:31 PM, mworni < reply@reply.github.com

wrote:

Ricardo - I did not think that this is working as femalesex does not have any missing values anymore - I did use the following command (file: NSQIP John Scarborough age complications.R*, starting line 837)

CrossTable(femalesex, femalesex_old, missing.include=TRUE, chisq=TRUE)

CrossTable(femalesex, femalesex_old, missing.include=TRUE, chisq=TRUE)

Cell Contents ------------------------- N Chi-square contribution N / Row Total N / Col Total N / Table Total

Total Observations in Table: 100829

         | femalesex_old
femalesex 0 1 NA Row Total
0 49894 0 118 50012
25550.928 25134.218 1.100
0.998 0.000 0.002 0.496
1.000 0.000 0.450
0.495 0.000 0.001
------------- ----------- ----------- ----------- -----------
1 0 50673 144 50817
25146.172 24736.063 1.082
0.000 0.997 0.003 0.504
0.000 1.000 0.550
0.000 0.503 0.001
------------- ----------- ----------- ----------- -----------
Column Total 49894 50673 262 100829
0.495 0.503 0.003
------------- ----------- ----------- ----------- -----------

Statistics for All Table Factors

Pearson's Chi-squared test

Chi^2 = 100569.6 d.f. = 2 p = 0

On Tue, Jul 31, 2012 at 3:46 PM, Ricardo Pietrobon < reply@reply.github.com

wrote:

i don't think they are non-overlapping, but i guess i understand what you are saying now. what I would do is to include NA as a category in the counts. to do that, just include the table option exclude=NULL . in that way you will be comparing the proportions before and after imputation

if that doesn't work just shoot me the code and i will fix it

On Tue, Jul 31, 2012 at 9:31 AM, mworni < reply@reply.github.com

wrote:

My concern is that those variables are mutually exclusive. What does the chisquare test tell me? It is a highly significant result but I think it does not test what I want to know. I would like to see that the distribution of male/females in the original dataset is the same as in the imputed dataset - but right now I think I test that females in femalesex_old are exclusively females if they are females in the imputed dataset and vice versa.

I struggle a little with explaining but I actually think that this test is missleading.


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7398202


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7398592

Mathias Worni, MD, MHS Consulting Associate in Surgery Department of Surgery Duke University Medical Center


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7464058


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7464913

Mathias Worni, MD, MHS Consulting Associate in Surgery Department of Surgery Duke University Medical Center

rpietro commented 12 years ago

sorry, my bad. what you want is http://goo.gl/Vp6uy

so:

prop.test(c(A, B),c(C,D))

A - total number of women in the sample without imputation B - total number of women in the sample with imp C - total sample without imputation (don't count the missing data) D - total sample with imputation = total sample size for the data set

might also want to throw in a graphic if you think reviewers are going to be concerned about the imputation introducing bias

On Thu, Aug 2, 2012 at 3:06 PM, mworni < reply@reply.github.com

wrote:

actually what I would expect is a non-significant result as I hope the distribution between the original and the imputed dataset are similar. Here I get a chisqr value of 100,000 - this is not really what I think the result should be. The test performed does tell me something else - if I would say that femalesex_old is a test and femalesex is the disease, then this would be highly significant as if the test is positive, the disease would be present and vice versa - but I hoped to see the total opposite.

On Thu, Aug 2, 2012 at 9:02 PM, Ricardo Pietrobon < reply@reply.github.com

wrote:

why do you think it is not working?

On Thu, Aug 2, 2012 at 2:31 PM, mworni < reply@reply.github.com

wrote:

Ricardo - I did not think that this is working as femalesex does not have any missing values anymore - I did use the following command (file: NSQIP John Scarborough age complications.R*, starting line 837)

CrossTable(femalesex, femalesex_old, missing.include=TRUE, chisq=TRUE)

CrossTable(femalesex, femalesex_old, missing.include=TRUE, chisq=TRUE)

Cell Contents ------------------------- N Chi-square contribution N / Row Total N / Col Total N / Table Total

Total Observations in Table: 100829

         | femalesex_old
femalesex 0 1 NA Row Total
0 49894 0 118 50012
25550.928 25134.218 1.100
0.998 0.000 0.002 0.496
1.000 0.000 0.450
0.495 0.000 0.001
------------- ----------- ----------- ----------- -----------
1 0 50673 144 50817
25146.172 24736.063 1.082
0.000 0.997 0.003 0.504
0.000 1.000 0.550
0.000 0.503 0.001
------------- ----------- ----------- ----------- -----------
Column Total 49894 50673 262 100829
0.495 0.503 0.003
------------- ----------- ----------- ----------- -----------

Statistics for All Table Factors

Pearson's Chi-squared test

Chi^2 = 100569.6 d.f. = 2 p = 0

On Tue, Jul 31, 2012 at 3:46 PM, Ricardo Pietrobon < reply@reply.github.com

wrote:

i don't think they are non-overlapping, but i guess i understand what you are saying now. what I would do is to include NA as a category in the counts. to do that, just include the table option exclude=NULL . in that way you will be comparing the proportions before and after imputation

if that doesn't work just shoot me the code and i will fix it

On Tue, Jul 31, 2012 at 9:31 AM, mworni < reply@reply.github.com

wrote:

My concern is that those variables are mutually exclusive. What does the chisquare test tell me? It is a highly significant result but I think it does not test what I want to know. I would like to see that the distribution of male/females in the original dataset is the same as in the imputed dataset - but right now I think I test that females in femalesex_old are exclusively females if they are females in the imputed dataset and vice versa.

I struggle a little with explaining but I actually think that this test is missleading.


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7398202


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7398592

Mathias Worni, MD, MHS Consulting Associate in Surgery Department of Surgery Duke University Medical Center


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7464058


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7464913

Mathias Worni, MD, MHS Consulting Associate in Surgery Department of Surgery Duke University Medical Center


Reply to this email directly or view it on GitHub:

https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7465030