google / fhir-data-pipes

A collection of tools for extracting FHIR resources and analytics services on top of that data.
https://google.github.io/fhir-data-pipes/
Apache License 2.0
151 stars 84 forks source link

`pipelines/batch` uber JAR facing error when running the 2nd time of synchronizing FHIR Server A to FHIR Server B (re-sync) #879

Closed muhammad-levi closed 10 months ago

muhammad-levi commented 10 months ago

Given pipelines/batch uber JAR executed successfully the first time, when executing it for the second time, then there will be ERROR log as follows:

2023-11-16 16:26:13 09:26:13.640 [main] ERROR com.google.fhir.analytics.DwhFiles -- Attempting to write to the timestamp file /tmp//timestamp_start.txt which already exists
2023-11-16 16:26:13 Exception in thread "main" java.nio.file.FileAlreadyExistsException: Attempting to write to the timestamp file /tmp//timestamp_start.txt which already exists
2023-11-16 16:26:13     at com.google.fhir.analytics.DwhFiles.writeTimestampFile(DwhFiles.java:251)
2023-11-16 16:26:13     at com.google.fhir.analytics.DwhFiles.writeTimestampFile(DwhFiles.java:235)
2023-11-16 16:26:13     at com.google.fhir.analytics.EtlUtils.runPipelineWithTimestamp(EtlUtils.java:60)
2023-11-16 16:26:13     at com.google.fhir.analytics.FhirEtl.main(FhirEtl.java:403)

Is that should be the expected behaviour? Or should it be able to resync?

muhammad-levi commented 10 months ago

Somehow I got it resolved by rebuilding the docker image from the modified Dockerfile of pipelines/batch, probably bug due to previous crash in the previous run which already created the timestamp_start.txt, maybe need to be more graceful / fail-safe when crashing (e.g. delete the timestamp_start.txt) ?

bashir2 commented 10 months ago

Sorry missed your original report earlier. This is actually by design not to delete the timestamp_start.txt such that we do not overwrite previous run outputs. What the first error is telling you is that your output directory may already have data in it (i.e., an old timestamp_start.txt exists in it). You can get rid of that by either removing the content of the output directory or use a new output directory. This is automatically handled by the controller as it uses a new [timestamped] output directory for each run.

bashir2 commented 10 months ago

I am closing this as WAI.