uwrit / leaf

Leaf Clinical Data Explorer
https://www.youtube.com/watch?v=ZuKKC7B8mHI
Other
88 stars 47 forks source link

Saved queries from earlier versions of the LeafDB #508

Open artgoldberg opened 2 years ago

artgoldberg commented 2 years ago

Dear Leaf folks

Our clinical data warehouse changes daily, so we update the LeafDB frequently. We plan to start doing it daily, hopefully soon. Sometime the update does not change the concepts available in Leaf, and sometimes it does. But to make the update code simple, I have it erase these tables and then reload them:

/* Empty indices */
DELETE FROM LeafDB.app.ConceptForwardIndex;
DELETE FROM LeafDB.app.ConceptInvertedIndex;
DELETE FROM LeafDB.app.ConceptTokenizedIndex;

/* Empty dependency maps */
DELETE FROM LeafDB.rela.QueryConceptDependency;

/* Empty Concepts */
DELETE FROM LeafDB.app.Concept;

/* Empty ConceptSqlSet */
DELETE FROM LeafDB.app.ConceptSqlSet;

/* Empty DemographicQuery */
DELETE FROM LeafDB.app.DemographicQuery;

But I'm now noticing that older Saved Queries stop working. Here's the error that occurs when I open old ones: image

My guess is that their Concepts refer to entries in ConceptSqlSet that no longer exist or have some other reference that break when we erase and update LeafDB. Does that make sense? (I confess that I'm being lazy now and not investigating these failures in the Leaf log. It's been a long week and it's getting late.)

Perhaps it was not part of the initial design, but It seems to me that it would be a valuable and reasonable functionality to support both

  1. frequent updates to the LeafDB, and
  2. long-lived Saved Queries, subject to the constraint that old Queries that depend on concepts (little c, as in ideas) that are no longer in LeafDB cannot possibly work.

What are your thoughts about what's causing older Saved Queries to fail, and about how to design the frequent updates to the LeafDB and the Saved Queries so that features 1 and 2 are both satisfied?

Regards Arthur

ndobb commented 2 years ago

Hi @artgoldberg,

Sure, this makes sense. Yes, a given Saved Query contains a JSON-based definition of the dependent Concepts it needs, which are key'd by the ConceptId in LeafDB.app.Concept.Id. So deleting a Concept and repopulating the database with a new concept that looks the same but has a new, different ConceptId would indeed break an existing Saved Query.

The intended way to handle Concept updates (and used by us at UW) is to leverage the ExternalId field in LeafDB.app.Concept.ExternalId. So for example, if you have a Concept for Demographics -> Language -> French, you could populate the Concept's ExternalId with something like "demographics:language:french" (or whatever you want).

Assuming you are adding new Concepts by SQL script, you would thus change the final INSERT INTO LeafDB.app.Concept statement to include a WHERE clause like:

INSERT INTO LeafDB.app.Concept
SELECT ...
FROM <new_concepts> AS N

/* Don't add if Concept already exists */
WHERE NOT EXISTS (SELECT 1 FROM LeafDB.app.Concept AS C WHERE N.ExternalId = C.ExternalId)

Thus if you can deterministically generate consistent ExternalIds each script run, you can preserve the existing Concepts (and delete old ones if needed) while still adding new ones as your data changes.

Of course, this takes a bit more work in terms of scripting, but not too much. There are other benefits to this approach too - for example, if you ever want to check the Leaf logs to see which Concepts are being used most frequently over time, having consistent durable ConceptIds is a necessity.

Hope this helps, -nic