rbcavanaugh / pnt

Beta-version of R Shiny implementation of the computer-adaptive Philadelphia naming test (Roach et al., 1996) following item response theory.
https://william-hula.shinyapps.io/pnt-cat/
GNU General Public License v3.0
0 stars 6 forks source link

rank transform and mean t scores #6

Closed rbcavanaugh closed 2 years ago

rbcavanaugh commented 3 years ago

should results have an option to be viewed as percentile ranks based on rank-transformed scores from prior anomia study or T-scores?

GF and WH to discuss

gfergadiotis commented 3 years ago

I believe Will covered that in one of his most recent emails. I will let him add his response here.

william-hula commented 3 years ago

I think the primary output should be in T-scores (M =50, SD = 10), achieved by rescaling the item parameters, which I can do. Per my recent email, we should not rank-transform the scores to follow a normal distribution because that would complicate the prediction of expected item and test scores based on person scores and item parameters. I think it might be a nice feature to show a histogram or density plot of the T-scores from the calibration sample, and provide a vertical line or hash mark denoting the current examinee's score and provide both the T-score and percentile rank relative to the calibration sample. Putting it on a chart makes it visually clear that the distribution isn't normal, and thus why a T-score of 50 isn't the median and a T-score of 30.4 isn't necessarily the 2.5th percentile.

william-hula commented 2 years ago

I just committed 3 csv files, one containing T-scaled item parameters and the other two containing the theta estimates from the 335-person calibration sample, one z-scaled, the other T-scaled. I suppose I could've just added them here, I apologize if I'm not using the tools here correctly. At any rate, when doing EAP estimation of scores for using the T-scaled parameters, be sure to set the mean (sd) of the prior to 50 (10) and I recommend setting the score boundaries to 5, 95, equivalent to -4.5, +4.5 with a standard normal prior. These were the settings in catIrt that provided the smallest error between ability estimates from catIrt and those from Helen Huston's Stan code. Lemme know if you have questions or concerns.

rbcavanaugh commented 2 years ago

thanks Will! I only see 2 files added (pnt_Tscaled and thetas_MAPPD) - can you check the third one?

I think at our next meeting we should discuss where these need to be implemented in the app - I'm not quite sure I follow. or @AlexSwiderski can bring me up to speed i the mean time.

rbcavanaugh commented 2 years ago

Here is a to do list for this issue

Doing these on a separate branch called T-scores

rbcavanaugh commented 2 years ago

Final outstanding issue here - we need a way to test whether the T-score scaled estimates are correct (as we did for the...other...estimates.) Otherwise, t-scale has been incorporated into the app rather seamlessly

william-hula commented 2 years ago

I think the only way to do this will be to test the theta estimates for some number of administrations against estimation with catR or catIrt. Since we never implemented T-scaling in catpuccino and I don't really know how to modify it, I don't think that option is open to use. Alternatively, I could send you the raw data for the whole 335 (or 35X-person sample along with their theta estimates from Helen's Stan code (which is actually what I just posted in the other thread) and you could check against those. They won't match exactly but the correlation should be ~1 with deviations on the order of 0.015 or smaller in T-score units.

rbcavanaugh commented 2 years ago

I think a subset of the 335x would be sufficient - the minimum number you would want to see tested (perhaps ~30?). I agree - if we're getting essentially the same estimates, this would make me confident in the algorithm via CATR. Is this a potential method to report in the future paper such that we might want to take a fixed %? (E.g. "10% of tests were re-run using the pnt-cat shiny web-app using catR for reliability and correlations with the validated algorithm exceeded 0.99 and deviations were < 0.015"...)

william-hula commented 2 years ago

Yep, that'll work. I'll send you a file with the raw responses and Stan theta estimates (z and T-scaled, where T = 10z + 50).

rbcavanaugh commented 2 years ago

ok I"m going to close this issue because the rank transformation and t scores are done. This issue of reliability now falls under issue 18 write tests https://github.com/aphasia-apps/pnt/issues/18