hapifhir / hapi-fhir

🔥 HAPI FHIR - Java API for HL7 FHIR Clients and Servers
http://hapifhir.io
Apache License 2.0
2.05k stars 1.34k forks source link

$export does not work at all for DSTU3. #2855

Closed theGOTOguy closed 3 years ago

theGOTOguy commented 3 years ago

Describe the bug $export works for FHIR R4, but does not return any data for FHIR DSTU3.

To Reproduce

Start a vanilla FHIR JPA server in DSTU3 mode:

docker run -p 8080:8080 -e hapi.fhir.bulk_export_enabled=true -e hapi.fhir.fhir_version=DSTU3 hapiproject/hapi:latest

Use Synthea to generate some data.

git clone https://github.com/synthetichealth/synthea.git
cd synthea
./gradlew build check test
./run_synthea -p 100 --exporter.fhir_stu3.export true --exporter.hospital.fhir_stu3.export true --exporter.practitioner.fhir_stu3.export true
for file in output/fhir_stu3/hospitalInformation*; do curl -X POST --header "Content-Type: application/json" -d @$file http://localhost:8080/fhir/; done
for file in output/fhir_stu3/practitionerInformation*; do curl -X POST --header "Content-Type: application/json" -d @$file http://localhost:8080/fhir/; done
for file in output/fhir_stu3/[A-Z]*.json; do curl -X POST --header "Content-Type: application/json" -d @$file http://localhost:8080/fhir/; done

Request an export:

curl -I -H "Prefer: respond-async" -X GET http://localhost:8080/fhir/\$export

Note the location of your $export-poll-status link after the above command. Wait a few minutes, then:

curl -X GET http://localhost:8080/fhir/\$export-poll-status\?_jobId=YOUR_JOB_ID_GOES_HERE

There is no output key at all in the response JSON:

{
  "transactionTime" : "2021-08-02T21:17:27.493+00:00",
  "request" : "/$export?_outputFormat=application%2Ffhir%2Bndjson"
}

Expected behavior

The output key in the result of the $export operation should have included ndjson files containing the FHIR server's data. For instance, repeat the same procedure using R4.

Start a vanilla FHIR JPA server in R4 mode:

docker run -p 8080:8080 -e hapi.fhir.bulk_export_enabled=true hapiproject/hapi:latest

Use Synthea to generate some data.

git clone https://github.com/synthetichealth/synthea.git
cd synthea
./gradlew build check test
./run_synthea -p 100
for file in output/fhir/hospitalInformation*; do curl -X POST --header "Content-Type: application/json" -d @$file http://localhost:8080/fhir/; done
for file in output/fhir/practitionerInformation*; do curl -X POST --header "Content-Type: application/json" -d @$file http://localhost:8080/fhir/; done
for file in output/fhir/[A-Z]*.json; do curl -X POST --header "Content-Type: application/json" -d @$file http://localhost:8080/fhir/; done

Request an export:

curl -I -H "Prefer: respond-async" -X GET http://localhost:8080/fhir/\$export

Note the location of your $export-poll-status link after the above command. Wait a few minutes, then:

curl -X GET http://localhost:8080/fhir/\$export-poll-status\?_jobId=YOUR_JOB_ID_GOES_HERE

There is a while lot of ndjson exported:

{
  "transactionTime" : "2021-08-02T21:17:27.493+00:00",
  "request" : "/$export?_outputFormat=application%2Ffhir%2Bndjson"
  "output": [
     ... There are a lot of files here! ...
  ]
}

Environment (please complete the following information):

Additional context We need to $export ndjson files as part of a deidentification pipeline. Currently we use the DSTU3 format, so the broken $export in DSTU3 is a showstopping bug for us.

theGOTOguy commented 3 years ago

I returned to my computer with the docker container still running, ran the exact same curl command as in my repro again, and it worked. Baffled, I repeated the entire repro again under STU3, and it worked a second time.

I am closing this issue for now, until I can get a better handle on why this is happening.

theGOTOguy commented 3 years ago

For any who may follow and be confused about the behavior of system-level $export, it appears that there is some system-level resource in our production server that causes all export threads to crash and for the job to hang forever with no progress made on the server. It is not clear to me which resource this may be, and even using a custom build with the log level set to DEBUG the reason for the failure is not obvious. I tried to create a reproduction for this issue without success, thus I'm leaving the bug closed because I feel it would be unrealistic to expect anyone to fix it given the lack of a clear cause in logs and the lack of a repro.