jhu-bids / TermHub

Web app and CLI tools for working with biomedical terminologies. https://github.com/orgs/jhu-bids/projects/9/views/7
https://bit.ly/termhub
GNU General Public License v3.0
11 stars 10 forks source link

Bug: 0 patient/record counts #491

Closed joeflack4 closed 2 weeks ago

joeflack4 commented 1 year ago

Overview

Stephanie was trying to use this on TermHub dev (#489): https://icy-ground-0416a040f.2.azurestaticapps.net/OMOPConceptSets?codeset_ids=417730759&codeset_ids=423850600&codeset_ids=966671711&codeset_ids=577774492 but experienced some issues.

Screenshots & comments

I did get 0 for patient counts for some csets, but maybe this is true for those?: 0 patient counts - in 1 cset 0 patient counts - in 2 csets And this screenshot is from Stephanie: ![0 patients steph](https://github.com/jhu-bids/TermHub/assets/13045020/d31746bd-acf2-46c2-a3a8-590052690fbb) One of those csets has 0 members, so I can understand why the counts would be 0. But the other ones with 0 counts have many members, so is it not surprising that the counts would be 0? Edit: I checked the database and there are no records for those instances where the counts are 0.

Solutions

Possible solution details

1. Periodic fetches of concept_set_counts_clamped (need to fix GH action)

Currently blocked by GH action failing.

It may be a disk space issue. If so, some ideas: a. delete the datasets/ files after doing prepped_files, leaving only prepped b. download and upload, and delete files 1 at a time c. counts/vocab separate actions d. download/upload big tables in chunks (e.g. 50% of the parquet)

Original comments (comlpeted)

@Sigfried Correct me if my understanding of how we get the counts in the UI is wrong. I have a bunch of questions about this. i. Is the primary source of this information coming from the `concept_set_counts_clamped` table? If so, isn't that something major that we overlooked? The DB refresh not set up to fetch that. ii. Should we be fetching this table dataset once or more a day? iii. And while we're at it, any other non-vocab datasets that we also cannot get by fetching through the Objects API, if there are any others? iv. Or am I wrong, and is this table not the primary way that we are getting counts? v. If it's not the primary way, doesn't it still need to be updated? vi. Have you been updating this table periodically? vii. If so, how?

stephanieshong commented 1 year ago

the concept set with 0 patient counts and 0 record counts are newly created around 3:45 today. So it is less than day old if that helps in debugging.

stephanieshong commented 1 year ago

@Sigfried @joeflack4 - I found the issue with the 0 patient counts. The term usage count is generated in the Enclave and displayed on the termHub. So if the term usage count is not updated in the Enclave the termHub will also display 0 counts for both patients and the records. In the Enclave if the associated research project for the concept set points to the tenant project then the the term usage count will not get updated at all, including the concept set overlap. Thus the 0 counts on the termHub.

stephanieshong commented 1 year ago

Once this issue is resolved in the Enclave the termHub will be able to correctly display the patient and record counts. This issue brings up a bigger issue where the Enclave is now supporting two different data sources, one for COVID and another for tenant. We will have to decide how the term usage count should reflect the data repository for two different cohort on the back end.

stephanieshong commented 1 year ago

Notes that can be helpful, in the Enclave we now have two data repositories ( N3C COVID and N3C Clinical(tenant)), as such the Concept Set TermUsage tab needs to know which data repository it should reference before it can display the patient count and record counts. If the user has access to the only one we can default to one base on the user permission settings.

1.operational person like me who has access to both data sources will now have to indicate (choose) which data sources to reference before the concept set browser can generate the correct patient /record counts.

  1. user who only has access to N3C COVID, we can default to N3C COVID as the data source, not an issue
  2. user who only has access to the N3C Tenant, we can default to N3C Tenant as the data source, not an issue
  3. If the user only has access to the one of the Tenant ( COPD, Renal or ALz), we can filter to only to the specific tenant group. However, if the user has access to more than one tenants should we filter to both tenant before generating
stephanieshong commented 1 year ago

The main issue is that the patient count and the record count is not being updated. And this is due to the fact that the current functionality of the termUsage counts in Concept Set browser in the Enclave is Research Project dependent. If the RP is set to tenant the usage count will return 0.

joeflack4 commented 7 months ago

@Sigfried Do you think this has been fixed?

Sigfried commented 2 weeks ago

@joeflack4, I don't quite understand it. Is it replicable?

joeflack4 commented 2 weeks ago

@Sigfried I think it is fixed because the GitHub action for the counts is working.