google / fhir-data-pipes

A collection of tools for extracting FHIR resources and analytics services on top of that data.
https://google.github.io/fhir-data-pipes/
Apache License 2.0
154 stars 87 forks source link

Setting --fhirSinkPath to a GCP FHIR Store does not work #194

Closed omarismail94 closed 3 years ago

omarismail94 commented 3 years ago

How to replicate

Spin up the OpenMRS and MySQL images:

 docker-compose -f docker/openmrs-compose.yaml up

Next, compile the JARs:

mvn clean -DskipTests install

Then run a batch pipeline, setting the --fhirSinkPath to a GCP FHIR store:

java -cp batch/target/fhir-batch-etl-bundled-0.1.0-SNAPSHOT.jar     org.openmrs.analytics.FhirEtl  \
--openmrsServerUrl=http://localhost:8099/openmrs     --openmrsUserName=admin  --openmrsPassword=Admin123 \
--fhirSinkPath=projects/84163485625/locations/us-central1/datasets/testme/fhirStores/openmrsrelay \
--resourceList=Patient,Encounter,Observation --batchSize=20

The program exits, with the stacktrace:

java.lang.NoClassDefFoundError: com/google/api/services/healthcare/v1/CloudHealthcare$Builder
        at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:371)
        at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:339)
        at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:219)
        at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:322)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:308)
        at org.openmrs.analytics.FhirEtl.runFhirFetch(FhirEtl.java:138)
        at org.openmrs.analytics.FhirEtl.main(FhirEtl.java:219)
Caused by: java.lang.NoClassDefFoundError: com/google/api/services/healthcare/v1/CloudHealthcare$Builder
        at org.openmrs.analytics.GcpStoreUtil.createClient(GcpStoreUtil.java:124)

Issue

It seems the JAR does not contain the com.google.api.services.healthcare.v1.CloudHealthcare$Builder class, which is used in GcpStoreUtil.java.

I verified this by listing the contents of the jar using:

jar -tf batch/target/fhir-batch-etl-bundled-0.1.0-SNAPSHOT.jar | tee content.text

and did not see the com/google/api/services/healthcare/v1/CloudHealthcare$Builder path.

Potential Fix

In the batch pom.xml file, add:

        <dependency>
            <groupId>com.google.apis</groupId>
            <artifactId>google-api-services-healthcare</artifactId>
            <version>v1-rev20200515-1.30.9</version>
        </dependency>

to the dependency tree. However, this would mean this dependency existing in both the batch pom, and the common module pom. Is that OK?

bashir2 commented 3 years ago

Thanks @omarismail94 for finding this issue and investigating it. I finally had a chance to look into some details here and I think what is happening is that in the batch module we are getting com.google.apis:google-api-services-healthcare from two different dependencies, i.e., from common and org.apache.beam:beam-sdks-java-io-google-cloud-platform (this might have started to be an issue after we upgraded our Beam dep version). So for the common module we have:

$ mvn dependency:tree -pl common | grep healthcare
[INFO] +- com.google.apis:google-api-services-healthcare:jar:v1-rev20200515-1.30.9:compile

but for the batch module we have:

$ mvn dependency:tree -pl batch | grep healthcare
[INFO] |  +- com.google.apis:google-api-services-healthcare:jar:v1beta1-rev20210217-1.31.0:compile

I am guessing because these are different versions of the same artifact, Maven drops the older one. And it causes the problem because package name for CloudHealthcare.java has changed.

As for the solution, your proposed one works but I prefer to upgrade the dep version in common to a newer version (i.e., the other way around). Can you please try that and see if it fixes the problem?

BTW, can you please file a bug to extend our e2e tests to include GCP FHIR store scenario as well? We should have caught this at the time that it was broken.

omarismail94 commented 3 years ago

@bashir2 I updated the dep version in common to v1beta1-rev20210217-1.31.0, and updated two packages that needed a higher version for them to work: google-api-client and google-api-http-client. Tested this and it worked!

Thanks @omarismail94 for finding this issue and investigating it. I finally had a chance to look into some details here and I think what is happening is that in the batch module we are getting com.google.apis:google-api-services-healthcare from two different dependencies, i.e., from common and org.apache.beam:beam-sdks-java-io-google-cloud-platform (this might have started to be an issue after we upgraded our Beam dep version). So for the common module we have:

$ mvn dependency:tree -pl common | grep healthcare
[INFO] +- com.google.apis:google-api-services-healthcare:jar:v1-rev20200515-1.30.9:compile

but for the batch module we have:

$ mvn dependency:tree -pl batch | grep healthcare
[INFO] |  +- com.google.apis:google-api-services-healthcare:jar:v1beta1-rev20210217-1.31.0:compile

I am guessing because these are different versions of the same artifact, Maven drops the older one. And it causes the problem because package name for CloudHealthcare.java has changed.

As for the solution, your proposed one works but I prefer to upgrade the dep version in common to a newer version (i.e., the other way around). Can you please try that and see if it fixes the problem?

BTW, can you please file a bug to extend our e2e tests to include GCP FHIR store scenario as well? We should have caught this at the time that it was broken.

omarismail94 commented 3 years ago

Done! Issue #200 created and assigned to me

BTW, can you please file a bug to extend our e2e tests to include GCP FHIR store scenario as well? We should have caught this at the time that it was broken.