veg / hivtrace-viz

https://hivtrace-viz.vercel.app
7 stars 9 forks source link

helpers.collapseLargeCategories overwrites valid categories in node definitions and causes all kinds of havoc for me #122

Closed spond closed 8 months ago

spond commented 10 months ago

I see that this code is taking fields like hiv_aids_dx_dt and overwriting valid date entries with 'Other'. This completely breaks everything downstream. I am going to revert this, but if the change is currently running on any SH-T instances, I expect all of the date based functionality (e.g. CoI) will be completely fucked up. This code needs to be reverted ASAP. @stevenweaver

https://github.com/veg/hivtrace-viz/blame/d198f5835ac7931e32bad74029a3c342a23d6138/src/clusternetwork.js#L559

 console.log (json.Nodes[0].patient_attributes.hiv_aids_dx_dt);

  self.collapsedCategories = helpers.collapseLargeCategories(
    json.Nodes,
    new_schema
  );

  console.log (json.Nodes[0].patient_attributes.hiv_aids_dx_dt);
image
stevenweaver commented 10 months ago

I'm sorry this caused you trouble, but a category limit is a legitimate issue that is causing legibility problems for some users (e.g. me, @stevenweaver) when there are so many categories one cannot distinguish between unique values in the visualization. Will keep this issue open to ensure that the category limit feature only affects categorical data types. 1.1.2 includes the code reversion.

spond commented 10 months ago

Dear @stevenweaver,

Not a problem. I agree that there should be a mechanism to display categories with more than N values, it's just that this particular implementation was overwriting the original values, including for the fileds it should have left alone. My preferred solution would be "soft-censoring", i.e. keep the original values for all attributes in node records, but then, when time comes to display them, dynamically adjust the display range.

I'll mock something up this week.

Best, Sergei

stevenweaver commented 10 months ago

OK, as a heads up, dynamically adjusting the display range is non-trivial, and the idea behind adding an "Other" category after a set limit also works for its statistical properties. For these reasons, and the ease of adjusting the category limit variable (could be set to infinite in the code), is the reason I chose this approach.

github-actions[bot] commented 8 months ago

Stale issue message