naccdata / uniform-data-set

Repository for REDCap artifacts for the UDS and associated forms
Other
8 stars 0 forks source link

Calculated Field for Global CDR #81

Open sbrown-iu opened 1 year ago

sbrown-iu commented 1 year ago

Hi All, In my work developing the B4 form, I was asked to see about using a calculated field to calculate the Global CDR. I've made quite a bit of progress, however have gotten stuck mostly at the more unusual circumstances section (see below for scoring rules from the guidance document. Unfortunately, at least as I have been told, the algorithm that powers the NACC calculator is considered IP and cannot be shared without being licensed, so I was not able to view that. Regardless, I'm wondering if this is really possible using native REDCap functionality, but since there is so much expertise in our group I wanted to see if others have any thoughts or possible solutions (or possibly already do this in their REDCap instance!). Any thoughts or help on this would be appreciated. Thanks!!

Scoring Rules: The global CDR is derived from the scores in each of the six categories (“box score”) as follows: • Memory (M) is the primary category, and all others are secondary. • CDR = M if at least three secondary categories are given the same score as memory. Whenever three or more secondary categories are given a score greater or less than the memory score, CDR = score of majority of secondary categories on whichever side of M has the greater number of secondary categories. However, when three secondary categories are scored on one side of M and two secondary categories are scored on the other side of M, then CDR = M. • When M = 0.5, CDR = 1 if at least three of the other categories are scored 1 or greater. If M = 0.5, then CDR cannot be 0; it can only be 0.5 or 1. • If M = 0, then CDR = 0 unless there is impairment (0.5 or greater) in two more secondary categories, in which case CDR = 0.5. Although applicable to most Alzheimer’s disease situations, these rules do not cover all possible scoring combinations. Unusual circumstances that occasionally occur in AD, and may be expected in non-Alzheimer’s dementia as well, are scored as follows:

  1. With ties in the secondary categories on one side of M, choose the tied scores closest to M for CDR (e.g., if M and another secondary category = 3, two secondary categories = 2, and two secondary categories = 1, then CDR = 2).
  2. When only one or two secondary categories are given the same score as M, CDR = M as long as no more than two secondary categories are on either side of M.
  3. When M = 1 or greater, CDR cannot be 0; in this circumstance, CDR = 0.5 when the majority of secondary categories are 0.
cfmurch commented 1 year ago

Hi Steven, so I built out the algorithm in R at one point so I can appreciate how complex it winds up being. A lot of the issue as you pointed out is the final score is dependent on how many non-memory categories you get that have scores that are greater/less/equal relative to the memory score. What I ultimately found is there are five initial considerations many of which return the memory score:

  1. Memory equal to 0
  2. Memory is flanked by two scores on one side and three on the other, return the Memory score
  3. At least three categories equal to Memory return Memory (same result as consideration 2)
  4. There are no more than 2 categories above or below the Memory score, again return Memory
  5. If there are three or more secondary categories all tied, set CDR to that tied value

At this stage, there must be at least three categories above OR below the memory score. Things now get pretty complicated and require some additional calculations. The kicker is many of these are dependent on whether Memory is 0.5 or not.

  1. Find the majority score either above or below the memory score
  2. Check if there's a tie or select the sum closest to the memory score
  3. As a final check, if the global score was cast as a 0 but Memory is greater than 0, set to 0.5

Because of these steps, there's no way to build out the final CDR score in a single field since REDCap won't let you store temporary calculations within a field. That being said, this could absolutely be done using a couple of initial fields to build out that "different from memory" table first, the follow-up majority tabulation, and then use some matryoshka doll conditionals in a final calculated field. Then it's just a matter of taking those set of calculated fields, setting the action tags to @HIDDEN and then using a descriptive field to display the final result. It's going to be ugly as sin and have probably require 3-4 additional calculated fields but I can see it being possible. Let me whiteboard this a bit to see if I can translate my R code to REDCap and I'll see about sharing a small modular instrument.

Regards Chad

sbrown-iu commented 1 year ago

Thanks so much Chad! Yeah, I have 7 or 8 additional/interim calculated fields, but was having a hard time seeing the light at the end of the tunnel. Really appreciate your time in looking into this. Let me know anything I can do to help or test what you come up with

kgauthreaux commented 1 year ago

@sbrown-iu @cfmurch Hi! I just wanted to give an update as on the most recent data manager survey we asked whether anyone had a version of the CDR global score calculator in REDCap that they would be willing to share with NACC. So far, we have 5 Centers including Chad's response that said "Yes" to this question. The survey does not close until 6/7, but I can go ahead and reach out to the other 4 Centers to see if they would be willing to share these if you think that would be helpful! Do you think we should have them upload to the b4 folder (forms/uds/b4)?

cfmurch commented 1 year ago

@sbrown-iu and everyone else on the development team. So it took awhile to churn through but I think I was finally able to implement an automatic global CDR calculator in REDCap. I've taken a minimal variable set that you'd see on the B4 in REDCap and added roughly a dozen fields that step through the various components of the algorithm to eventually return a score. The various calculated components do use the NACC expected variables (e.g. [memory] for the Memory component). If you're site is using different conventions for the component scores, the instrument will need to be updated. Similarly, if you upload my instrument into your UDS implementation and it already has those variable names, the names in the new instrument will be updated for uniqueness so you may suddenly have [memory_c46458] as a field. If you do your own piloting, it might be best to upload this instrument in a sandbox REDCap environment.

I still need to do full comprehensive testing on it since we really should check this against all possible CDR component combinations before letting it loose in the wild. That being said, it does appear to be working in agreement with my R implementation as best as I can tell but I'd love to have other people try to break it. I added some comments to a section header above the set of calculated fields indicating the rough process. Most of the heavy lifting comes from the interim score which is what follows the scoring guildelines detailed in the WU St Louis PDF at https://knightadrc.wustl.edu/wp-content/uploads/2021/06/CDR-Scoring-Rules.pdf .

One thing to mention that will need to be checked with WashU is I'm not sure if the flanking rule (two components both one score value above and one score value below will return the memory score) or the majority rule (specifically what to do with three values above with two at 1) takes precedence in the scoring algorithm. This is specifically impacting an edge case where you have memory at 0.5 with two scores at 0 and two scores at 1. If the flanking rule takes precedence then it'll return 0.5, otherwise it will return 1 if the majority rule is preferred. I've designed it where flanking takes precedence which is what the guidelines ostensibly suggest but the NACC calculator looks to use the majority rule. This is the only ambiguity I've stumbled across to this point but, again, we'd need to do a full combinatorial sweep to see if there are any other discrepancies.

In any case, you'll find the .ZIP files for the instrument attached here. I wanted to get this up before the June EDC Development meeting so if you have any comments, questions, or insights, we can talk then, continue using this thread, or you can just shoot me an email. B4CDR_2023-06-29_0947.zip

-Chad

sbrown-iu commented 1 year ago

Thanks @cfmurch! I was not sure how much time you had to spend on this so I have been working on this as well and also got a working version completed yesterday. I have 10 "interim" calculated fields that support the overall calculation, and it did get a bit hacky in the end identifying some rarer combinations. I have tested my version against my Centers data and it works for the ~5000 records of the B4 we have.

For next steps, I'll take your version and test it on my Center's data as well, and see how it works. I can also work on setting up a test of all possible combinations, but it may take me some time to get to that.

I also identified a specific combination where I have a question about the current scoring calculator's output, if there is time we can discuss today or perhaps communicate with NACC over email about these cases with questions.

sbrown-iu commented 7 months ago

Hello All, Just wanted to note that the CDR global calculation is complete, and has been confirmed to match the online calculator for all ~ 1800 unique item combinations NACC has in their database. We did find one outlier where we thought the calculator was incorrect and working with NACC created a new rule to satisfy that issue. I've attached the form below in case anyone wants to review it, it ended up being a mix of Chad and I's initial work, then some added semi-hard coding of combinations to get it to correctly assign scores for the various rare edge cases. Have one more question about this form that I hope to discuss on our next call. NACC_Version_B4CDRDementiaStaging_2023-12-14_1250.zip