uwrit / leaf

Leaf Clinical Data Explorer
https://www.youtube.com/watch?v=ZuKKC7B8mHI
Other
88 stars 47 forks source link

"Illegal SQL CRUD operation(s) found in SQL" error #447

Open artgoldberg opened 3 years ago

artgoldberg commented 3 years ago

Hi Folks

A couple of our queries generate this error. Here's an example:

-- "Unrecoverable validation error in query. Error:{Error}","Properties":{"Error":"1 illegal SQL CRUD operation(s) found in SQL:

SELECT  _S000.person_id FROM (SELECT [condition_occurrence_id],
                          [person_id] = CONVERT(NVARCHAR(64), [person_id], 2),
                          [condition_concept_id],
                          [condition_start_date],
                          [condition_start_datetime],
                          [condition_end_date],
                          [condition_end_datetime],
                          [condition_type_concept_id],
                          [stop_reason],
                          [provider_id],
                          [visit_occurrence_id],
                          [visit_detail_id],
                          [condition_source_value],
                          [condition_source_concept_id],
                          [condition_status_source_value],
                          [condition_status_concept_id]
                   FROM rpt.test_omop_conditions.condition_occurrence_deid) AS _S000 WHERE EXISTS
(SELECT 1
 FROM
     omop.cdm_deid_std.concept AS _S000C_ICD10CM,
     omop.cdm_deid_std.concept_relationship AS _S000CR,
     omop.cdm_deid_std.concept AS _S000C_SNOMED
 WHERE
     _S000C_ICD10CM.vocabulary_id = 'ICD10CM'
     AND _S000C_ICD10CM.concept_id = _S000CR.concept_id_1
     AND _S000CR.relationship_id = 'Maps to'
     AND _S000C_SNOMED.vocabulary_id = 'SNOMED'
     AND _S000C_SNOMED.concept_id = _S000CR.concept_id_2
     AND _S000.condition_concept_id = _S000C_SNOMED.concept_id
     AND _S000C_ICD10CM.concept_code = 'O44.13') /* Complete placenta previa with hemorrhage, third trimester (ICD10CM:O44.13) */  GROUP BY _S000.person_id;

A log with 3 such errors and multiple lines before and after the error is attached. filtered_leaf-api-20210914.log The error is in leaf/src/server/Model/Compiler/SqlValidator.cs. But I don't understand what it means.

These queries run fine as stand-alone SQL.

An unusual aspect of these queries is that the ICD10CM concept maps to multiple SNOMED concepts. (About 30% of the ICD10CM concepts in Athena map to multiple SNOMED concepts in the CONCEPT_RELATIONSHIP table.)

I suspect that this factor contributes to this error. In particular, the 1-to-many ICD10CM to SNOMED mapping means that a given condition_occurrence_deid record may satisfy the query multiple times, which could select duplicate person_ids. Maybe it needs to be wrapped in DISTINCT().

Thanks A

artgoldberg commented 3 years ago

Hi @ndobb

Hope you had a great vacation. Checking in on thoughts about this.

Arthur

ndobb commented 3 years ago

Hi @artgoldberg,

Ah - I think this is the culprit AND _S000C_ICD10CM.concept_code = 'O44.13') /* Complete placenta previa with hemorrhage, third trimester (ICD10CM:O44.13) */ GROUP BY _S000.person_id;.

The token "with " is included in the illegal commands Leaf checks for https://github.com/uwrit/leaf/blob/master/src/server/Model/Compiler/Common/Dialect.cs#L76.

Essentially this is just a crude case-insensitive check for any of the listed strings which could potentially refer to DDL operations (though I'm pretty sure WITH does not, we added it in anyway). I stopped short of writing a proper parser to check for these as that seemed overkill, though in cases like this the error is a nuisance.

One workaround could be to run something to replace the word "with" with something else, such as:

UPDATE [app].[Concept]
SET SqlSetWhere = REPLACE(SqlSetWhere, 'with ', 'w/ ')
WHERE SqlSetWhere LIKE '%WITH %'

In the future we could (1) write a proper parser, (2) remove WITH from the list and assume to be benign, (3) drop the check entirely and assume the Leaf SQL user isn't allowed to run DDL (I really don't like this option).

Best, -nic

artgoldberg commented 2 years ago

In the meantime, I'm not including documentation comments in Leaf queries.

ndobb commented 2 years ago

Thanks @artgoldberg. I've just released v3.10 (https://github.com/uwrit/leaf/releases/tag/v3.10.0), which removes "WITH" from the list of illegal commands.

artgoldberg commented 2 years ago

@ndobb In my opinion, this issue should be left open.

Fundamentally, what's happening is that Leaf wants to ensure that dynamic queries do not execute DDL, and does so by raising the "Illegal SQL ... " error if they contain any DDL keywords. But this ignores the possibility that these keywords are in comments, or in literals used by queries, or in other SQL constructs.

To summarize, while it's important to prevent dynamic SQL from damaging the database, this approach risks data-dependant false errors that are difficult to debug especially for users who have no contact with Leaf's developers, and leaves them in the cumbersome position of being unable to do reasonable things because Leaf raises a false error.

In my view, an effective and non-risky approach to protecting the DBMS should be used instead. E.g., perhaps access protections could be used:

  1. Make the Leaf DB read-only
  2. Run dynamic SQL
  3. Restore previous access rights I've not thought through all the details, but that could provide comprehensive protection and avoid any false errors.

Regards, Arthur