google / fhir-data-pipes

A collection of tools for extracting FHIR resources and analytics services on top of that data.
https://google.github.io/fhir-data-pipes/
Apache License 2.0
142 stars 82 forks source link

Patient_flat view creates duplicate rows for some patients #932

Closed fredhersch closed 5 months ago

fredhersch commented 5 months ago

When I run the pipelines with the sample synthetic data loaded (from the tutorial), the patient table has 86 records, whereas the patient_flat has 114.

Running distinct on patient_flat is get 86

Just checking that this is the expected behaviour of the flat_table view?

bashir2 commented 5 months ago

Thanks for reporting this issue; which patient_flat is this, this one on the Thrift server or this one on PostgreSQL? Creating multiple rows for the same patient is possible, for example because of the forEach sections. But if the rows are exactly the same (i.e., all column values are the same) then we should look into this.

Also how did you run the pipeline; if it is through the command-line, can you please copy the arguments here?

bashir2 commented 5 months ago

Closing this because in the flat view getting multiple rows from the same resource is expected. @fredhersch please feel free to re-open if the above explanation is not clear or does not explain your case.