Open mworni opened 12 years ago
I am sorry, I thought i had answered this by email, but apparently it didn't show up. I think that what you are doing is correct. why do you think it is wrong?
My concern is that those variables are mutually exclusive. What does the chisquare test tell me? It is a highly significant result but I think it does not test what I want to know. I would like to see that the distribution of male/females in the original dataset is the same as in the imputed dataset - but right now I think I test that females in femalesex_old are exclusively females if they are females in the imputed dataset and vice versa.
I struggle a little with explaining but I actually think that this test is missleading.
i don't think they are non-overlapping, but i guess i understand what you are saying now. what I would do is to include NA as a category in the counts. to do that, just include the table option exclude=NULL . in that way you will be comparing the proportions before and after imputation
if that doesn't work just shoot me the code and i will fix it
On Tue, Jul 31, 2012 at 9:31 AM, mworni < reply@reply.github.com
wrote:
My concern is that those variables are mutually exclusive. What does the chisquare test tell me? It is a highly significant result but I think it does not test what I want to know. I would like to see that the distribution of male/females in the original dataset is the same as in the imputed dataset - but right now I think I test that females in femalesex_old are exclusively females if they are females in the imputed dataset and vice versa.
I struggle a little with explaining but I actually think that this test is missleading.
Reply to this email directly or view it on GitHub:
https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7398202
Ricardo - I did not think that this is working as femalesex does not have any missing values anymore - I did use the following command (file: NSQIP John Scarborough age complications.R*, starting line 837)
CrossTable(femalesex, femalesex_old, missing.include=TRUE, chisq=TRUE)
CrossTable(femalesex, femalesex_old, missing.include=TRUE, chisq=TRUE)
Cell Contents | ------------------------- | N | Chi-square contribution | N / Row Total | N / Col Total | N / Table Total |
---|
Total Observations in Table: 100829
| femalesex_old
femalesex | 0 | 1 | NA | Row Total |
---|---|---|---|---|
0 | 49894 | 0 | 118 | 50012 |
25550.928 | 25134.218 | 1.100 | ||
0.998 | 0.000 | 0.002 | 0.496 | |
1.000 | 0.000 | 0.450 | ||
0.495 | 0.000 | 0.001 | ||
------------- | ----------- | ----------- | ----------- | ----------- |
1 | 0 | 50673 | 144 | 50817 |
25146.172 | 24736.063 | 1.082 | ||
0.000 | 0.997 | 0.003 | 0.504 | |
0.000 | 1.000 | 0.550 | ||
0.000 | 0.503 | 0.001 | ||
------------- | ----------- | ----------- | ----------- | ----------- |
Column Total | 49894 | 50673 | 262 | 100829 |
0.495 | 0.503 | 0.003 | ||
------------- | ----------- | ----------- | ----------- | ----------- |
Statistics for All Table Factors
Chi^2 = 100569.6 d.f. = 2 p = 0
On Tue, Jul 31, 2012 at 3:46 PM, Ricardo Pietrobon < reply@reply.github.com
wrote:
i don't think they are non-overlapping, but i guess i understand what you are saying now. what I would do is to include NA as a category in the counts. to do that, just include the table option exclude=NULL . in that way you will be comparing the proportions before and after imputation
if that doesn't work just shoot me the code and i will fix it
On Tue, Jul 31, 2012 at 9:31 AM, mworni < reply@reply.github.com
wrote:
My concern is that those variables are mutually exclusive. What does the chisquare test tell me? It is a highly significant result but I think it does not test what I want to know. I would like to see that the distribution of male/females in the original dataset is the same as in the imputed dataset - but right now I think I test that females in femalesex_old are exclusively females if they are females in the imputed dataset and vice versa.
I struggle a little with explaining but I actually think that this test is missleading.
Reply to this email directly or view it on GitHub:
https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7398202
Reply to this email directly or view it on GitHub:
https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7398592
Mathias Worni, MD, MHS Consulting Associate in Surgery Department of Surgery Duke University Medical Center
why do you think it is not working?
On Thu, Aug 2, 2012 at 2:31 PM, mworni < reply@reply.github.com
wrote:
Ricardo - I did not think that this is working as femalesex does not have any missing values anymore - I did use the following command (file: NSQIP John Scarborough age complications.R*, starting line 837)
CrossTable(femalesex, femalesex_old, missing.include=TRUE, chisq=TRUE)
CrossTable(femalesex, femalesex_old, missing.include=TRUE, chisq=TRUE)
Cell Contents ------------------------- N Chi-square contribution N / Row Total N / Col Total N / Table Total Total Observations in Table: 100829
| femalesex_old
femalesex 0 1 NA Row Total 0 49894 0 118 50012 25550.928 25134.218 1.100 0.998 0.000 0.002 0.496 1.000 0.000 0.450 0.495 0.000 0.001 ------------- ----------- ----------- ----------- ----------- 1 0 50673 144 50817 25146.172 24736.063 1.082 0.000 0.997 0.003 0.504 0.000 1.000 0.550 0.000 0.503 0.001 ------------- ----------- ----------- ----------- ----------- Column Total 49894 50673 262 100829 0.495 0.503 0.003 ------------- ----------- ----------- ----------- ----------- Statistics for All Table Factors
Pearson's Chi-squared test
Chi^2 = 100569.6 d.f. = 2 p = 0
On Tue, Jul 31, 2012 at 3:46 PM, Ricardo Pietrobon < reply@reply.github.com
wrote:
i don't think they are non-overlapping, but i guess i understand what you are saying now. what I would do is to include NA as a category in the counts. to do that, just include the table option exclude=NULL . in that way you will be comparing the proportions before and after imputation
if that doesn't work just shoot me the code and i will fix it
On Tue, Jul 31, 2012 at 9:31 AM, mworni < reply@reply.github.com
wrote:
My concern is that those variables are mutually exclusive. What does the chisquare test tell me? It is a highly significant result but I think it does not test what I want to know. I would like to see that the distribution of male/females in the original dataset is the same as in the imputed dataset - but right now I think I test that females in femalesex_old are exclusively females if they are females in the imputed dataset and vice versa.
I struggle a little with explaining but I actually think that this test is missleading.
Reply to this email directly or view it on GitHub:
https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7398202
Reply to this email directly or view it on GitHub:
https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7398592
Mathias Worni, MD, MHS Consulting Associate in Surgery Department of Surgery Duke University Medical Center
Reply to this email directly or view it on GitHub:
https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7464058
actually what I would expect is a non-significant result as I hope the distribution between the original and the imputed dataset are similar. Here I get a chisqr value of 100,000 - this is not really what I think the result should be. The test performed does tell me something else - if I would say that femalesex_old is a test and femalesex is the disease, then this would be highly significant as if the test is positive, the disease would be present and vice versa - but I hoped to see the total opposite.
On Thu, Aug 2, 2012 at 9:02 PM, Ricardo Pietrobon < reply@reply.github.com
wrote:
why do you think it is not working?
On Thu, Aug 2, 2012 at 2:31 PM, mworni < reply@reply.github.com
wrote:
Ricardo - I did not think that this is working as femalesex does not have any missing values anymore - I did use the following command (file: NSQIP John Scarborough age complications.R*, starting line 837)
CrossTable(femalesex, femalesex_old, missing.include=TRUE, chisq=TRUE)
CrossTable(femalesex, femalesex_old, missing.include=TRUE, chisq=TRUE)
Cell Contents ------------------------- N Chi-square contribution N / Row Total N / Col Total N / Table Total Total Observations in Table: 100829
| femalesex_old
femalesex 0 1 NA Row Total 0 49894 0 118 50012 25550.928 25134.218 1.100 0.998 0.000 0.002 0.496 1.000 0.000 0.450 0.495 0.000 0.001 ------------- ----------- ----------- ----------- ----------- 1 0 50673 144 50817 25146.172 24736.063 1.082 0.000 0.997 0.003 0.504 0.000 1.000 0.550 0.000 0.503 0.001 ------------- ----------- ----------- ----------- ----------- Column Total 49894 50673 262 100829 0.495 0.503 0.003 ------------- ----------- ----------- ----------- ----------- Statistics for All Table Factors
Pearson's Chi-squared test
Chi^2 = 100569.6 d.f. = 2 p = 0
On Tue, Jul 31, 2012 at 3:46 PM, Ricardo Pietrobon < reply@reply.github.com
wrote:
i don't think they are non-overlapping, but i guess i understand what you are saying now. what I would do is to include NA as a category in the counts. to do that, just include the table option exclude=NULL . in that way you will be comparing the proportions before and after imputation
if that doesn't work just shoot me the code and i will fix it
On Tue, Jul 31, 2012 at 9:31 AM, mworni < reply@reply.github.com
wrote:
My concern is that those variables are mutually exclusive. What does the chisquare test tell me? It is a highly significant result but I think it does not test what I want to know. I would like to see that the distribution of male/females in the original dataset is the same as in the imputed dataset - but right now I think I test that females in femalesex_old are exclusively females if they are females in the imputed dataset and vice versa.
I struggle a little with explaining but I actually think that this test is missleading.
Reply to this email directly or view it on GitHub:
https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7398202
Reply to this email directly or view it on GitHub:
https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7398592
Mathias Worni, MD, MHS Consulting Associate in Surgery Department of Surgery Duke University Medical Center
Reply to this email directly or view it on GitHub:
https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7464058
Reply to this email directly or view it on GitHub:
https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7464913
Mathias Worni, MD, MHS Consulting Associate in Surgery Department of Surgery Duke University Medical Center
sorry, my bad. what you want is http://goo.gl/Vp6uy
so:
prop.test(c(A, B),c(C,D))
A - total number of women in the sample without imputation B - total number of women in the sample with imp C - total sample without imputation (don't count the missing data) D - total sample with imputation = total sample size for the data set
might also want to throw in a graphic if you think reviewers are going to be concerned about the imputation introducing bias
On Thu, Aug 2, 2012 at 3:06 PM, mworni < reply@reply.github.com
wrote:
actually what I would expect is a non-significant result as I hope the distribution between the original and the imputed dataset are similar. Here I get a chisqr value of 100,000 - this is not really what I think the result should be. The test performed does tell me something else - if I would say that femalesex_old is a test and femalesex is the disease, then this would be highly significant as if the test is positive, the disease would be present and vice versa - but I hoped to see the total opposite.
On Thu, Aug 2, 2012 at 9:02 PM, Ricardo Pietrobon < reply@reply.github.com
wrote:
why do you think it is not working?
On Thu, Aug 2, 2012 at 2:31 PM, mworni < reply@reply.github.com
wrote:
Ricardo - I did not think that this is working as femalesex does not have any missing values anymore - I did use the following command (file: NSQIP John Scarborough age complications.R*, starting line 837)
CrossTable(femalesex, femalesex_old, missing.include=TRUE, chisq=TRUE)
CrossTable(femalesex, femalesex_old, missing.include=TRUE, chisq=TRUE)
Cell Contents ------------------------- N Chi-square contribution N / Row Total N / Col Total N / Table Total Total Observations in Table: 100829
| femalesex_old
femalesex 0 1 NA Row Total 0 49894 0 118 50012 25550.928 25134.218 1.100 0.998 0.000 0.002 0.496 1.000 0.000 0.450 0.495 0.000 0.001 ------------- ----------- ----------- ----------- ----------- 1 0 50673 144 50817 25146.172 24736.063 1.082 0.000 0.997 0.003 0.504 0.000 1.000 0.550 0.000 0.503 0.001 ------------- ----------- ----------- ----------- ----------- Column Total 49894 50673 262 100829 0.495 0.503 0.003 ------------- ----------- ----------- ----------- ----------- Statistics for All Table Factors
Pearson's Chi-squared test
Chi^2 = 100569.6 d.f. = 2 p = 0
On Tue, Jul 31, 2012 at 3:46 PM, Ricardo Pietrobon < reply@reply.github.com
wrote:
i don't think they are non-overlapping, but i guess i understand what you are saying now. what I would do is to include NA as a category in the counts. to do that, just include the table option exclude=NULL . in that way you will be comparing the proportions before and after imputation
if that doesn't work just shoot me the code and i will fix it
On Tue, Jul 31, 2012 at 9:31 AM, mworni < reply@reply.github.com
wrote:
My concern is that those variables are mutually exclusive. What does the chisquare test tell me? It is a highly significant result but I think it does not test what I want to know. I would like to see that the distribution of male/females in the original dataset is the same as in the imputed dataset - but right now I think I test that females in femalesex_old are exclusively females if they are females in the imputed dataset and vice versa.
I struggle a little with explaining but I actually think that this test is missleading.
Reply to this email directly or view it on GitHub:
https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7398202
Reply to this email directly or view it on GitHub:
https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7398592
Mathias Worni, MD, MHS Consulting Associate in Surgery Department of Surgery Duke University Medical Center
Reply to this email directly or view it on GitHub:
https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7464058
Reply to this email directly or view it on GitHub:
https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7464913
Mathias Worni, MD, MHS Consulting Associate in Surgery Department of Surgery Duke University Medical Center
Reply to this email directly or view it on GitHub:
https://github.com/rpietro/NSQIPageComplications/issues/29#issuecomment-7465030
Ricardo - how can I compare statistically femalesex before and after imputation? I would like to show that the distribution of male/female stays the same.
I did:
CrossTable(femalesex, femalesex_old, chisq=TRUE)
Statistics for All Table Factors
Pearson's Chi-squared test
Chi^2 = 100567 d.f. = 1 p = 0
Pearson's Chi-squared test with Yates' continuity correction
Chi^2 = 100563 d.f. = 1 p = 0
and I also did
CrossTable(femalesex_old, femalesex, missing.include=TRUE, chisq=TRUE)
Total Observations in Table: 100829
Statistics for All Table Factors
Pearson's Chi-squared test
Chi^2 = 100569.6 d.f. = 2 p = 0
But actually I think both is wrong... I think I have a knot in my brain somewhere...