jhu-bids / TermHub

Web app and CLI tools for working with biomedical terminologies. https://github.com/orgs/jhu-bids/projects/9/views/7
https://bit.ly/termhub
GNU General Public License v3.0
8 stars 10 forks source link

ValueSet-Tools: Web Application(s) / workflow management tooling #45

Open joeflack4 opened 2 years ago

joeflack4 commented 2 years ago

Description

It seems like our team (and perhaps people from outside our team?) might find useful if we created 1 or more web application for doing concept set management, w/ dashboarding features for oversight.

Components / features

More features written by Siggie 2022/03/25: Current steps that we would be looking to tie together / expose to UI / automate:

  1. Identify authoritative value sets has OIDs? yes/no
  2. Put them in spreadsheet with metadata and OIDs a) Different for HCUP b) Will also be different for ATC medication sets
  3. Tell programmer type to load them into the enclave (continue from POV of programmer type)
  4. Programmer gets OIDs, two ways to do it now, choose favorite
  5. (a) With OIDs copied from wherever to CSV, (b) or software grabs directly from spreadsheet
  6. Set parameters for VSAC fetch: tell it where to get OIDs and where to put palantir-three-file CSVs
  7. Run VSAC fetch, sanity check
  8. Set parameter for enclave import to say where to find the 3 files
  9. Run that
  10. Some automated processes occur on the enclave. Stephanie wrote some of these, others, maybe Amin; we don't know what they all are
  11. Tell Stephanie to tell Amin that we did it.
  12. Amin does some special thing
  13. Siggie then tries to figure out which concept sets were actually just loaded, and copies that data back out of the enclave in order to run validation and find concept sets and concept codes that should have made it into the enclave but didn't. (And sometimes it seems that more makes it into the enclave than should.)
  14. Generate report and share with team of missing codes and csets
  15. Now it's Lisa's problem
  16. Lisa sanity checks, does something, who knows what?
  17. Once satisfied, Lisa changes them all from DRAFT to new versions
  18. Tell whoever is interested that we're done

Possible implementations

Could choose one or more combinations of these.

  1. CLI->Web UI auto-generation tools
  2. Workflow management / DAG tools (e.g. Apache airflow)
  3. New dedicated web app

Name candidates

  1. Term hub

Related issues

https://github.com/jhu-bids/termhub-csets/issues/1

trberg commented 2 years ago

I think what is missing from our concept set management tools is a hierarchical view of included concepts vs the overlapping concepts in other concept sets. It would be really helpful to understand what branches from the standard vocabularies are missing in a given concept set, but that are included in other concept sets.

If we could visualize the SNOMED tree, starting at the lowest common graph node, and coloring all descendants as overlapped or not overlapping, we could visualize what descendant branches are included or excluded from a given concept set.

joeflack4 commented 2 years ago

Interesting. I haven't created one of these types of visualizations before. But when it comes time to do this, I suppose I will see if there is a good tree visualization library, supporting coloration, available in Python, R, or JavaScript.

I suppose one thing we could do would be something like a "heat tree", where the greater the number of overlap, the stronger the coloration.

From a perspective of whoever would be using this, would this be best done within the enclave, or external to it?

stephanieshong commented 2 years ago

9.Run that 10.Some automated processes occur on the enclave. Stephanie wrote some of 11.these, others, maybe Amin; we don't know what they all are 12.Tell Stephanie to tell Amin that we did it. 13.Amin does some special thing

Steps above from 9-13 has been automated in the enclave, here are the steps:

  1. Run Enclave wranger - uploads the concept set container, version, and to add the expression items to a draft version using code/codesystem
  2. Schedule has been defined on the Enclave to check for new bulk import of concept sets via the REST API ( polls every 5 mins) - scheduler kicks off a job to do the step 11
  3. The translation code is executed to translate the code/codesystem to concept_ids using the OMOP vocabulary tables.
  4. The translated concept sets are merged to the dataset that backs the concept set editor v2 on the Enclave
  5. The concept set editor filters the draft version from the UI until they become non-draft versions.