smart-on-fhir / cumulus-library-data-metrics

A data metrics study for the Cumulus project
https://docs.smarthealthit.org/cumulus/
Apache License 2.0
2 stars 0 forks source link

Averages in c_resources_per_pt_summary seem too high #30

Closed gotdan closed 5 months ago

gotdan commented 5 months ago

After running the metrics on a set of Synthea data for 118 patients, the average numbers don't seem to be correct and in some cases are higher than the max numbers - maybe a bug in the sql logic?

Screenshot 2024-05-23 at 4 29 41 PM
mikix commented 5 months ago

Here's what I get when I run it on the 100-patients sample-bulk-fhir-datasets files:

┌────────────────────┬──────────────────────────┬───────────────┬────────┐
│         id         │         category         │    average    │  max   │
│      varchar       │         varchar          │ decimal(18,2) │ int128 │
├────────────────────┼──────────────────────────┼───────────────┼────────┤
│ * All              │ * All                    │       1087.87 │  10180 │
│ AllergyIntolerance │ * All                    │          0.63 │      9 │
│ AllergyIntolerance │ environment              │          0.42 │      7 │
│ AllergyIntolerance │ food                     │          0.16 │      3 │
│ AllergyIntolerance │ medication               │          0.05 │      2 │
│ Condition          │ * All                    │         37.99 │    252 │
│ Condition          │ encounter-diagnosis      │         37.99 │    252 │
│ Device             │ * All                    │          1.73 │     22 │
│ DiagnosticReport   │ * All                    │        131.23 │   1174 │
│ DiagnosticReport   │ * No recognized category │         23.94 │    165 │
│      ·             │  ·                       │            ·  │      · │
│      ·             │  ·                       │            ·  │      · │
│      ·             │  ·                       │            ·  │      · │
│ Observation        │ exam                     │          0.57 │     11 │
│ Observation        │ imaging                  │          0.14 │      6 │
│ Observation        │ laboratory               │        350.21 │   4041 │
│ Observation        │ procedure                │          0.89 │     39 │
│ Observation        │ social-history           │         14.33 │     98 │
│ Observation        │ survey                   │         36.29 │    272 │
│ Observation        │ therapy                  │          0.78 │     82 │
│ Observation        │ vital-signs              │        143.03 │   1292 │
│ Procedure          │ * All                    │        117.38 │    805 │
│ ServiceRequest     │ * All                    │          0.00 │      0 │
├────────────────────┴──────────────────────────┴───────────────┴────────┤
│ 77 rows (20 shown)                                           4 columns │
└────────────────────────────────────────────────────────────────────────┘

That all looks right and reasonable to me. One thing I notice is that your screenshot has no decimals. How much math are you doing on the averages, on top of what c_resources_per_pt_summary gives you?

gotdan commented 5 months ago

Thanks - I was just reading the table as is, but apparently decimal rendering is broken in the observable BI tool since it looks right when I cast the column to float 😬