Closed suriyan3 closed 4 weeks ago
I have attached the sample data and queries for reference.
A few points:
item
s into a single column, separate items, e.g., based on their linkId
using constraints on the FHIRPath.Re 1: Well a subset of FHIRPath functions are supported, that's the way that we define each "column" in a view. For example, take a look at examples here where the first()
function of FHIRPath is used to choose one member of a collection. Are you saying that even simple examples like first()
does not work on Parquet+Spark OR is it more advanced FHIRPaths that don't work? If it is the latter, can you provide some examples?
Ok, So when I create a dataframe using
QuestionnaireResponse_df = runner.to_dataframe( qrs.select( { "qrs_id": qrs.id, "qrs_tag": qrs.meta.tag.code, "qrs_patient_id":qrs.subject.reference, "qrs_encounter_id": qrs.encounter.reference, "qrs_questionnaire": qrs.questionnaire, "qrs_item":qrs.item } ) )
when I do QuestionnaireResponse_df.iloc[0]['item'] it gives me this nested item as a string value
'[{"id":null,"linkId":"vitals","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"3.1","definition":null,"text":"ANC service provided at "},{"id":null,"linkId":"service-type-details","definition":null,"text":null},{"id":null,"linkId":"machine-available","definition":null,"text":"Is BP Machine Available? "},{"id":null,"linkId":"3.2","definition":null,"text":null},{"id":null,"linkId":"3.3","definition":null,"text":null},{"id":null,"linkId":"3.4","definition":null,"text":null},{"id":null,"linkId":"3.5","definition":null,"text":null},{"id":null,"linkId":"3.6","definition":null,"text":"Blood Group "},{"id":null,"linkId":"3.10","definition":null,"text":null},{"id":null,"linkId":"3.11","definition":null,"text":"Rh factor "},{"id":null,"linkId":"3.12","definition":null,"text":null}]},{"id":null,"linkId":"counselling","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"4.1","definition":null,"text":"Was counselling done for the following"},{"id":null,"linkId":"4.2","definition":null,"text":null},{"id":null,"linkId":"4.4","definition":null,"text":null},{"id":null,"linkId":"4.6","definition":null,"text":null}]},{"id":null,"linkId":"syphilis","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"5.1.0","definition":null,"text":null},{"id":null,"linkId":"5.3.0","definition":null,"text":null}]},{"id":null,"linkId":"hepatitis-B","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"6.1.0","definition":null,"text":null},{"id":null,"linkId":"6.3.0","definition":null,"text":null}]},{"id":null,"linkId":"hepatitis-C","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"7.1.0","definition":null,"text":null},{"id":null,"linkId":"7.3.0","definition":null,"text":null}]},{"id":null,"linkId":"hiv","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"8.1.0","definition":null,"text":null}]},{"id":null,"linkId":"malaria","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"malaria-1.0","definition":null,"text":null},{"id":null,"linkId":"malaria-3.0","definition":null,"text":null}]},{"id":null,"linkId":"SickleCellTest","definition":null,"text":"Sickle Cell and HbsAg test","answer":null,"item":[{"id":null,"linkId":"9.1.0","definition":null,"text":null},{"id":null,"linkId":"9.3.0","definition":null,"text":null}]},{"id":null,"linkId":"hematology","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"10.1.0","definition":null,"text":null}]},{"id":null,"linkId":"gestational-diabetes","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"11.1","definition":null,"text":null}]},{"id":null,"linkId":"urinalysis","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"12.1.0","definition":null,"text":null}]},{"id":null,"linkId":"Foetal-Examination","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"13.1","definition":null,"text":"Foetal Movement"},{"id":null,"linkId":"13.3","definition":null,"text":null},{"id":null,"linkId":"13.5","definition":null,"text":null},{"id":null,"linkId":"13.6","definition":null,"text":null},{"id":null,"linkId":"13.7","definition":null,"text":"Foetal Position"},{"id":null,"linkId":"13.11","definition":null,"text":null}]},{"id":null,"linkId":"hemanitic-calcium-supplements","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"15.1.0","definition":null,"text":null},{"id":null,"linkId":"15.2.0","definition":null,"text":null}]},{"id":null,"linkId":"llin-and-ipt-test","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"16.1.0","definition":null,"text":null},{"id":null,"linkId":"16.2.0","definition":null,"text":"Was IPT given?"}]},{"id":null,"linkId":"tetanus-diptheria","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"17.1.0","definition":null,"text":"Was Td given?"}]},{"id":null,"linkId":"associated-problems","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"18.1.0","definition":null,"text":"Associated problems"}]},{"id":null,"linkId":"previous-medical-history","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"19.1.0","definition":null,"text":"Previous medical history"}]},{"id":null,"linkId":"outcome-of-visit","definition":null,"text":null,"answer":null,"item":[{"id":null,"linkId":"20.1","definition":null,"text":null}]}]'
while creating the dataframe itself when I change to list or json system restarts or system goes unresponsive QuestionnaireResponse_df = runner.to_dataframe( qrs.select( { "qrs_id": qrs.id, "qrs_tag": qrs.meta.tag.code, "qrs_patient_id":qrs.subject.reference, "qrs_encounter_id": qrs.encounter.reference, "qrs_questionnaire": qrs.questionnaire, "qrs_item":list(qrs.item) // json.loads(qrs.item) } ) )
Update: what I meant by FHIRPath constraints was, for example, like this:
qr.select(
{
"id": qr.id,
"answer1_0": qr.item.where(qr.item.linkId=='1.0').answer.first().value.ofType('coding').code,
}
)
This works and the advantage is that it picks a specific answer and assigns it to a single "column" so you don't need to deal with json/arrays, etc. However, when I tried this with qr.item.item
it failed. It seems to be a bug in the fhir-views code. I have informed the team and they are looking into it.
In the meantime doing the same with SQL is probably your best bet. It also avoids loading everything into a dataframe in memory.
Update: The mentioned bug with qr.item.item
was fixed by this commit (thanks to @rbrush) but then the current Spark runner implementation failed at generating proper SQL for constraints on qr.item.item
. This is another issue which we need to look into. However, I like to suggest to reconsider doing direct SQL in the meantime. Here are some examples that hopefully should help:
The basic idea is to EXPLODE
repeated columns to get a flat view and then apply required constraints. In your case, because you have answer
fields at different levels, (e.g., item.item.answer
vs item.item.item.answer
), this is a little bit trickier but still doable. First, if you try this query (please pay attention to whether you have a _
in your table name or not):
SELECT QR.id,
(CASE item_2.linkId WHEN '1.1' THEN item_2.answer[0].value.coding.code ELSE NULL END) AS answer_1_1,
(CASE item_2.linkId WHEN '3.1' THEN item_2.answer[0].value.coding.code ELSE NULL END) AS answer_3_1,
(CASE item_3.linkId WHEN '1.2.5' THEN item_3.answer[0].value.coding.code ELSE NULL END) AS answer_1_2_5,
(CASE item_4.linkId WHEN 'v-value' THEN item_4.answer[0].value.coding.code ELSE NULL END) AS answer_v_value
FROM questionnaire_response AS QR
LATERAL VIEW OUTER EXPLODE(QR.item) AS item_1
LATERAL VIEW OUTER EXPLODE(item_1.item) AS item_2
LATERAL VIEW OUTER EXPLODE(item_2.item) AS item_3
LATERAL VIEW OUTER EXPLODE(item_3.item) AS item_4
you will see that for each QR.id, you get many rows. In the four answer columns, there are a lot of NULL values but there are also the corresponding answers. So we just need to aggregate these as follows:
SELECT QR.id,
FIRST((CASE item_2.linkId WHEN '1.1' THEN item_2.answer[0].value.coding.code ELSE NULL END), true) AS answer_1_1,
FIRST((CASE item_2.linkId WHEN '3.1' THEN item_2.answer[0].value.coding.code ELSE NULL END), true) AS answer_3_1,
FIRST((CASE item_3.linkId WHEN '1.2.5' THEN item_3.answer[0].value.coding.code ELSE NULL END), true) AS answer_1_2_5,
FIRST((CASE item_4.linkId WHEN 'v-value' THEN item_4.answer[0].value.coding.code ELSE NULL END), true) AS answer_v_value
FROM questionnaire_response AS QR
LATERAL VIEW OUTER EXPLODE(QR.item) AS item_1
LATERAL VIEW OUTER EXPLODE(item_1.item) AS item_2
LATERAL VIEW OUTER EXPLODE(item_2.item) AS item_3
LATERAL VIEW OUTER EXPLODE(item_3.item) AS item_4
GROUP BY QR.id;
We are using the FIRST
aggregation function, dropping NULL values. I am not sure about the performance of this approach but it at least unblocks you.
Just to clarify re performance: A major concern is the GROUP BY
part which is not really required to achieve what we are doing here; i.e., all answer
elements should be extracted in the context of a single row without requiring a table-wide GROUP BY
. I believe that's how fhir-views work but its generated SQL is too complex, hence this simpler approach.
Thanks @bashir I have tested this SQL queries it is working fine, we can capture the columns we need and create a flat table. Is it possible to write a query that can loop through all the objects and in each objects there might be nested items and answer so if answer is not null then can we create a column name from the value of text key and its column value be taken from answer array.
Besides the solution offered above, now with the ViewDefinition support we can create flat-views directly from the original QuestionnaireResponse
resources (with FHIRPaths to extract item
and answer
at arbitrary levels). Closing this issue ...
I am creating view here
views = r4.base_r4() qrs = views.view_of('QuestionnaireResponse')
QuestionnaireResponse_df = runner.to_dataframe( qrs.select( { "qrs_id": qrs.id, "qrs_tag": qrs.meta.tag.code, "qrs_patient_id":qrs.subject.reference, "qrs_encounter_id": qrs.encounter.reference, "qrs_questionnaire": qrs.questionnaire, "qrs_item":qrs.item } ) ) In this dataframe qrs_item column holds string of array of objects , I am not able to directly convert that to array of objects and assign it to qrs_item or assign individual objects to different columns like the below
runner.to_dataframe( qrs.select( { "qrs_item":qrs.item[0][0] "qrs_item_2.0":"qrs_item[0][1] } )
so I needed an helper function to convert that string to json
Function to convert JSON string to list
def convert_to_list(json_str): try: return json.loads(json_str) except json.JSONDecodeError: return [] QuestionnaireResource_df.zip
Convert the qrs_item column to list
QuestionnaireResponse_df ['qrs_item'] = QuestionnaireResponse_df ['qrs_item'].apply(convert_to_list)
And if I want to run an expression on this qrs_item I need to loop through all the items which is difficult if the item has many nested items. Is there any effective solution to solve this issue.