tidepool-org / hub

[DEPRECATED] Central storage for Tidepool planning and issue tracking.
2 stars 2 forks source link

determine how to de-dupe BG readings #109

Open jebeck opened 10 years ago

jebeck commented 10 years ago

Once we expand the types of self-monitored blood glucose readings we are pulling into our data platform (as discussed here), we will need a mechanism for preventing the user from being visually bombarded with duplicate BGs (for example because we get a BG event both directly from the user's meter and because the user entered that reading into the bolus wizard).

There are many ways to address this issue, among them (non-exhaustively):

cheddar commented 10 years ago

My vote is on "only visualize certain types" for now.

It seems like that is the shortest time to value path for this. Then, as we run into problems with visual artifacts from that, we make the next decision about what should be done.

jebeck commented 10 years ago

SGTM, @cheddar

kentquirk commented 10 years ago

Here's an idea:

If there are multiple events within a couple of minutes that have similar readings, then we aggregate them into a single dot but show all the values if you hover. Basically, if things would cause multiple dots to overlap a lot then we collapse them (and maybe indicate it with a 2 in the dot or some other indicator of specialness).

On Fri, Jun 6, 2014 at 7:33 PM, Jana Beck notifications@github.com wrote:

Once we expand the types of self-monitored blood glucose readings we are pulling into our data platform (as discussed here https://github.com/tidepool-org/tidepool-org.github.io/pull/17), we will need a mechanism for preventing the user from being visually bombarded with duplicate BGs (for example because we get a BG event both directly from the user's meter and because the user entered that reading into the bolus wizard).

There are many ways to address this issue, among them (non-exhaustively):

  • only visualize certain types (i.e., don't visualize manually entered wizard BGs at all)
  • server-side logic to "link" suspected duplicate events combined with client-side logic to display only one of a set of linked events
  • client-side logic to determine which events could be duplicate and a priority system for displaying only one

— Reply to this email directly or view it on GitHub https://github.com/tidepool-org/hub/issues/109.

Kent Quirk VP of Engineering, Tidepool

Tidepool is an open source, not-for-profit effort to build an open data platform and better applications to reduce the burden of Type 1 Diabetes.

jebeck commented 10 years ago

For clarification, is your suggestion for a particular client vs. server side solution @kentquirk or still agnostic on that point?

Otherwise I like the idea - I also think eventually it'll be useful to expose readings that you entered into your CGM for calibration, which are almost guaranteed to be duplicates (assuming we're also getting them from the meter). This will be helpful for knowing what to trust when the CGM trace and the fingerstick diverge (if you entered a fingerstick as a calibration, it's probably the CGM that's off).

cheddar commented 10 years ago

I agree on the calibration thing Jana, that's why deviceMeta has a calibration event that is defined as CGM calibrations ;).

On the finding overlaps, when we need to, it should be the visualization that is making the choice not to show data, not the server making the choice not to deliver data.

jebeck commented 10 years ago

Ah, @cheddar I think you misunderstand what I believe the server-side task would be. One way to do it, I believe, would be to link all suspected duplicates with a duplicateGroupId property or something, so the client side doesn't have to do any computation, only referencing that property and having some system of priority for determining which of the duplicate group gets visualized (with info on all of them potentially appearing in tooltip, as @kentquirk suggested).

kentquirk commented 10 years ago

I was presuming something like this would be de-duped on the client side; the client is the place where you can best tell if the dots would overlap.

Jana, I believe you are proposing that some duplicate detection code be run server-side that would decorate the data points with some indicator that they're possibly duplicated. I wasn't proposing that we be that sophisticated -- my algorithm was just "if the points would overlap on the display by 'too much', they would be collapsed into one and both the tooltip and the graphic used would indicate that there was an overlap. There's no judgement required, but what is required is some knowledge of how the information is displayed. In other words, if the dots get bigger, the algorithm gets adjusted.

jebeck commented 10 years ago

I'm not sure what you're describing, @kentquirk, in terms of the logic is something I'd agree with. It sounds like the logic you're proposing is actually sensitive to the visual overlap. That is not the type of de-duplication I think we should be doing, but rather semantic de-duplication based on (a) identical readings and (b) near-identical timestamps.

I don't believe we want to prevent displaying overlapping SMBG circles if you test your BG twice in a row within a short span of time (to calibrate your Dexcom, perhaps) and get a slightly different reading each time - 134 mg/dL and 140 mg/dL, for example. Those are separate, independently valid SMBG events, and having both might even be useful ("Oh, I checked my BG twice in a row here, that's probably when I calibrated a new Dexcom sensor.") unlike when you enter an SMBG into the bolus wizard or we pull an SMBG from both your meter directly and from the pump because it was a linked meter.

kentquirk commented 10 years ago

The problem as stated was "visual clutter". What I'm proposing reduces that without making any judgement about the data. It basically converts multiple overlapping dots to a single dot with an indicator of overlap (perhaps a number inside). So if you saw a dot that looked like (2) in place of ( ), you'd know that there were 2 readings there. And if you hover over that (2) you'll see the 134 and 140.

Which problem are we really trying to solve?

jebeck commented 10 years ago

@kentquirk I agree that the question of which problem(s) we're trying to solve is the question at issue here. I think the semantic de-duplication is far more important than the display de-duplication. The display de-duplication also presents significant challenges since I believe it would have to be code that runs as part of rendering (to query actual pixel distance between points, assuming our eventual goal of a responsive visualization). Semantic de-duplication could be part of the data pre-processing step that only happens once per dataset and doesn't affect anything but initial rendering time.