timsbiomed / issues

TIMS issue tracker.
https://github.com/orgs/timsbiomed/projects/9/views/1
1 stars 0 forks source link

Include high level documentation #66

Open cmungall opened 2 years ago

cmungall commented 2 years ago

The README is useful to understand how to set this up, but for an outsider like me it's difficult to tell what the overall objectives are, and how this relates to https://github.com/hapifhir/hapi-fhir

In particular, it would be useful to know

  1. what is the target data model? Is it some subset of FHIR, a profile/extensions?
  2. what is the infrastructure. E.g. are all ontologies loaded into main memory and served that way? is there a database?
  3. what is the overall data flow for how external ontologies get loaded? via the OWLAPI?
  4. what is the plans for a registry and how is that coordinated with other registries
  5. what is the overall strategy for dealing with the variety of different ways of modeling ontologies plus metadata properties? 5.i. are there hardwired assumptions about using rdfs:label, is this configurable? 5.ii. what ontology-level metadata (version, title, default language) etc is expected, assumed...? 5.iii. how are logical axioms handled? 5.iv how are hierarchical relationships handled? Is this only subClassOf between named classes? What about existential restrictions? What about nested existentials (as used in HPO e.g. phenotype2uberon)
  6. what is the strategy for reasoning 6.i is there an assumption ontologies are pre-classified (this is a good assumption IMO)

Prioritize this issue accordingly. Maybe I am missing some documentation elsewhere. If all devs know what they are doing then you can close this. But I think I could be of more use if some basic assumptions were stated somewhere. It may be more useful for you all too.

joeflack4 commented 2 years ago

@cmungall Thanks for the suggestions. Actually, the state of this repo and the org is still a bit confusing. I've just updated the README to clarify things.

@ShahimEssaid Probably what we should do is rename this repository from hapi-fhir-jpaserver-starter to hapi-projects, and force push the current codebase here. That way our code and issues are all in the same repository.

@cmungall I edited your OP to add numbers to the questions for easy reference. Taking a stab now at answering some of them. (@chrisroederucdenver @ShahimEssaid we can add answers to these in some docs / the README at some point perhaps).

  1. We're conforming to support FHIR R4 and hopefully soon R5 as well. We do have extensions, e.g. SSSOM fields on ConceptMap, but we don't have a formal data model written.
  2. A single ubuntu server. A little bit about it here. It uses HAPI architecture with some modifications that @ShahimEssaid could elaborate more on.
  3. The idea is: (.owl file) -> fhir-owl tool converts to a CodeSystem JSON -> POST || PUT to server.
  4. Melissa and others have brought this up but not decided yet. I believe that in some cases the plan is to do a redirect to an authoritative server for some content, and in other cases we will have a copy of the content uploaded to our server. @chrisroederucdenver might have more on this.
  5. This is under the purview of a new fhir-owl tool not created yet. We may want to continue the discussion there when I create that repo. My general idea though is to simply allow for as much as possible. Anything not recognized will be added as FHIR extension elements. It's very lenient. Basically accepts whatever you give it with no prior data modeling needed. We can move towards more formal definitions and constraints in the future as time allows. 5.i. Most likely I will map rdfs:label directly to concept names. 5.ii. Going to try to add as much of this as possible, using FHIR extensions as needed. 5.iii. That's a good question. I haven't closely looked at the Obographs output yet, but my hope is that I would add information contained within these axioms to the concepts; a lot of it will be FHIR extension eleemnts. 5.iv. FHIR natively supports multi-hierarchy using "properties". It is easy to add your own properties to a code system and have concepts use them. "Properties" is a bit of a misnomer, IMO; it refers both to concept properties as well as relationship types.
  6. Not sure what you mean. AFAIK we will support standard CodeSystem operations. I don't think we have anything like reasoning on the list of requirements. 6.i. Not sure what this means.
cmungall commented 2 years ago

Thanks, this is useful to me, hopefully it is useful to you to express these things (I don't want to create busy work or distractions)

We're conforming to support FHIR R4 and hopefully soon R5 as well. We do have extensions, e.g. SSSOM fields on ConceptMap, but we don't have a formal data model written

Got it, thanks

A single ubuntu server.

sorry I meant architecture not infrastructure!

A little bit about it here

I'll check this out but it's a bit hard to grok without insider knowledge. I am sure that if I check the HAPI docs I will learn more, e.g is there a relational database or triplestore or mongo or etc as back end

The idea is: (.owl file) -> fhir-owl tool converts to a CodeSystem JSON -> POST || PUT to server

Maybe I am just too old school but I am wary of services based solutions where file-based ingest would work, there are issues with timeouts on large ontologies and authentication, but you can likely ignore my concerns here if the broader hapi infrastructure works happily here

This is under the purview of a new fhir-owl tool not created yet

got it. This is one area where I could help at least give sanity checks, I have a lot of experience with wild west ontologies doing all kinds of things that do surprising things with metadata modeling

  1. Not sure what you mean

Some ontology services like OLS will run an OWL reasoner to do a classification step in advance first, unless that ontology is configured otherwise. I think that is a bad idea but this is subject of ongoing discussion between myself, Nico, David OS, Jim, and others. If you don't do a classification step you are bound to run into some ontology that releases a version that hasn't been classified in advance. This is nuts imo but if you don't have a strategy in place then the ontology will look fragmented and flat. The best thing to do is say: this is not the wild west, there are some standards your owl must adhere to before we include it

The other reasoning use case is closures. If you want to include anatomy ontologies like uberon that don't use SNOMED SEP hacks then you likely want to support operations like "include all parts of the brain". You can approximate this by doing naive graph walking over a set of predicates but it's better to run relation-graph ahead of time to precompute the closure.

chrisroederucdenver commented 2 years ago

@cmungall re: 4 "what is the plans for a registry and how is that coordinated with other registries" Are you thinking of registries, like a longitudinal clinical study? If not please clarify, maybe provide some links.

chrisroederucdenver commented 2 years ago

@cmungall 2 "what is the infrastructure. E.g. are all ontologies loaded into main memory and served that way? is there a database?" "A single ubuntu server." "sorry I meant architecture not infrastructure!"

HAPI-FHIR is an implementation of the FHIR spec. API meant to serve a combination of clinical and terminological data. jpaserver is reference to a back-end to the API that uses a database like postgres to server both. I saw a question elsewhere about about concerns I understand as trying to empty the pool through a straw. That is, if you're trying to get a whole ontology (even a large part), an API meant for interaction with single concepts will indeed have performance issues. That's not the FHIR use-case as I understand it. Consider that the terminology server and the clinical server are often one and the same. When data is entered or modified in the server, it can use the local terminologies to validate those changes. User access like looking up a term by id or text, are also reasonably served with such an architecture.

As far as emptying the pool and doing so in a way flexible to different schemas, this the reason I'm interested in LinkML, OAK, etc. When the goal is to become the RedHat of vocabulary distributions, you need to be able to deal with concepts en mass, not individual concepts. So there are at least two very different use-cases in play.

chrisroederucdenver commented 2 years ago

@cmungall Here's the pool-straw issue: 3

"what is the overall data flow for how external ontologies get loaded? via the OWLAPI?" "The idea is: (.owl file) -> fhir-owl tool converts to a CodeSystem JSON -> POST || PUT to server." "Maybe I am just too old school but I am wary of services based solutions where file-based ingest would work, there are issues with timeouts on large ontologies and authentication, but you can likely ignore my concerns here if the broader hapi infrastructure works happily here"

When the use-case involves whole ontologies, yeah, emptying the pool through a straw is worth careful consideration.

(edit) slide 7 in an OAK slide deck has more of ChrisM's thinking.

chrisroederucdenver commented 2 years ago

6 is gold. Computing closures has come up: "what is the strategy for reasoning 6.i is there an assumption ontologies are pre-classified (this is a good assumption IMO)"

Some ontology services like OLS will run an OWL reasoner to do a classification step in advance first, unless that ontology is configured otherwise. I think that is a bad idea but this is subject of ongoing discussion between myself, Nico, David OS, Jim, and others. If you don't do a classification step you are bound to run into some ontology that releases a version that hasn't been classified in advance. This is nuts imo but if you don't have a strategy in place then the ontology will look fragmented and flat. The best thing to do is say: this is not the wild west, there are some standards your owl must adhere to before we include it

The other reasoning use case is closures. If you want to include anatomy ontologies like uberon that don't use SNOMED SEP hacks then you likely want to support operations like "include all parts of the brain". You can approximate this by doing naive graph walking over a set of predicates but it's better to run relation-graph ahead of time to precompute the closure.

I'll probably create a separate issue ticket for this but regardless, ChrisM has my attention.

cmungall commented 2 years ago

Registries: I meant of ontologies.

E.g. let's say you are standing up a service that provides access to 50 vocabularies. There is presumably some kind of ETL process to bring those in. Maybe trivial - e.g. if all vocabularies are in OWL and are available from a public URL (unlikely for closed clinical terminologies I know). But more often than not what happens is you end up spinning up a system with metadata on each of your sources, with all of the bespoke configurations each one needs (e.g. to load skos vocabulary X, we need to map foaf:name to rdfs:label)

joeflack4 commented 2 years ago

Regarding (3), I agree it is better to have a way to load without the need for HTTP. Here's how we're loading them now (example). The server URL is defined by env variable HAPI_R4, which isn't defined in that script, but I believe localhost will work fine, so at least there's that.

Regarding (6), I agree w/ Chris R that this is a really good point you bring up. I added a step to the existing OWL/OBO issue to go over our results (after we've converted and uploaded to the server) and learn more about what we should do for reasoning/classification.


edit: Just copy/pasting here the update to README.md that I made recently:

TIMS/HOT HAPI FHIR Server

TIMS (Terminology Infrastructure Management Systems), AKA HOT (Health Open Terminology) ecosystem, is developing a FHIR server: http://20.119.216.32:8000/r4/swagger-ui/

This repository is not the actual codebase being deployed, but is simply a holding place for issues.

The current codebase is here.