medizininformatik-initiative / Projectathon7-VHF

Repository für den 7. MII Projectathon
6 stars 1 forks source link

main.R exited without meaningful error with retrieval #13

Open esalehi92 opened 1 year ago

esalehi92 commented 1 year ago

Hi,

we tried to follow the instructions in here to run the dupctl/docker container which runs the main.R R-script and after quite a long time (at least 5-6 hours) the script exited without any meaningful error. You can find the logs after running the command dupctl retrieve --dup vhf here:

<vm user>@<our vm>:~/VHF-DataSHIELD$ dupctl retrieve --dup vhf
2023/05/24 09:27:00 Using config file: config.toml
latest: Pulling from smith-phep/dup/vhf
Digest: sha<some hash>
Status: Image is up to date for registry.gitlab.com/smith-phep/dup/vhf:latest
Loading required package: fhircrackr
Loading required package: data.table
Run Retrieval with Parameters:
------------------------------
           FHIR_SERVER_ENDPOINT = <our fhir server>
               FHIR_SERVER_USER = not empty username
               FHIR_SERVER_PASS = not empty password
              FHIR_SERVER_TOKEN =
                     SSL_VERIFY = FALSE
             DECENTRAL_ANALYSIS = TRUE
                    MAX_BUNDLES = Inf
         BUNDLE_RESOURCES_COUNT = 10
      MAX_REQUEST_STRING_LENGTH = 2048
                          DEBUG = TRUE
                        VERBOSE = 0
                OUTPUT_DIR_BASE = /mnt
                    PROFILE_ENC =
                    PROFILE_OBS =
                    PROFILE_CON =
FHIR_SEARCH_SUBJECT_LIST_OPTION = COMMA_SEPARATED_PURE_IDS
main.R startet at 2023-05-24 09:27:02.

[1] "Errors in Retrieval from 2023-05-24 09:27:02:"
Downloading Observations:<our fhir server>/fhir/Observation?code=http://loinc.org%7C33763-4,http://loinc.org%7C71425-3,http://loinc.org%7C33762-6,http://loinc.org%7C83107-3,http://loinc.org%7C83108-1,http://loinc.org%7C77622-9,http://loinc.org%7C77621-1&date=ge2019-01-01&date=le2022-12-31&_include=Observation:patient&_count=10

Cracking 16176 Observation Bundles.

Could not resolve Patient ID references from Observations:
   1: psn-12345678
  ## issue-creator comment: here 1077 patient pseudonyms were printed, which I removed due to data protection issues ##
   1077: psn-87654321
Removed Observations with invalid Patient references: 1986 of 161757
Merging Observation and Patient data based on Patient id:
Number of unique Patient ids in Patient data: 33206 in 33206 rows
Number of unique Patient ids in Observation data: 33206 in 159771 rows

Number of unique Patient ids in merged table: 33206 in 159771 rows
Patient ID Chunk Size in request: 124
Downloading Encounters and Conditions.

<vm user>@<our vm>:~/VHF-DataSHIELD$

There are no csv outputs as expected, but only some 100s of xml data in format 12345.xml under the path outputLocal/VHF/Bundles/Observations/

Are we doing something wrong? Or is it the problem or the R-script?

Thank you in advance for the support and best regards, Ebrahim Salehi

astruebi commented 5 months ago

@esalehi92 Does exactly this error occur again and again? Could it be due to insufficient memory, network timeouts or something similar? Even if it takes quite a while to reach the error point, I need to know whether the error is reproducible.