num-codex / odm2fhir

This tool maps study/patient data in CDISC ODM based on the GECCO data dictionary to HL7 FHIR which adheres to the GECCO profiles, value sets and code systems.
MIT License
4 stars 1 forks source link

Technical ID (resource.id) changes across exports #16

Closed makampf closed 3 years ago

makampf commented 3 years ago

Describe the bug The technical ID in the field resource.id is not stable across multiple exports of the same resource. It is probably a randomly generated hash. Thus, any incremental/iterative loading/updating breaks. All target repositories (e.g. FHIR-Server, FHIR-Gateway, i2b2) would have to be cleared to not have redundant data with varying IDs/references.

To Reproduce Steps to reproduce the behavior:

  1. Export resource with odm2fhir
  2. Export same resource again
  3. Compare field resource.id

Expected behavior The technical id has to be stable. A recommended (and personally preferred) way would be to hash resource.identifier.system + resource.identifier.value and set this as the value for resource.id.

Note: Please also consider issue https://github.com/num-codex/odm2fhir/issues/15 in this regard.

CC: @noemide

holger-stenzhorn commented 3 years ago

@makampf I just adapted the code so that for all resources the resource.id is set to the MD5 hash of the concatenation of system and value of the first resource.identifier if that identifier exists or to a random UUID in case if not. The "minor" issue here is that currently an identifier is set only for the patient resource but not for any other and hence all resource.id except for the patient one are still unstable. In order to fix this the code for generating each resource needs to be adapted accordingly... (@cerbelding Can you have a look at this please? Thx! 🤝)

makampf commented 3 years ago

Thanks so far! I dont know much about RedCap. But is it maybe possible to use some concat of system + patient_id + redcap_form_id + redcap_record_id? I assume there are some internal redcap IDs that are stable?

(PS: Mabye we could also talk about MD5 vs. SHA or any cryptographic hashing. But thats another topic.)

cerbelding commented 3 years ago

We already did it just the way you described and are currently implementing it to all resource mappers.

For Resource.Identifier we are using a concat FormOID-FormRepeatKey-ItemGroupOID-ItemGroupRepeatKey-ItemOID and for Ressource.Id we're using the hash of Identifier.System, Identifier.Value and Patient-Id.

makampf commented 3 years ago

nice

holger-stenzhorn commented 3 years ago

@makampf This might a stupid question but why did you reopen the issue? I mean the issue has been closed not for fun but because the issue has actually been fixed... 😎 To generate the id of an element I use a concatenation the system, formOID, formRepeatKey, itemGroupOID, itemGroupRepeatKey and itemOID plus the patientId and then I hash it with MD5. The result, i.e. the Docker image, will be available in the next iteration of ODM2FHIR available on Monday.

holger-stenzhorn commented 3 years ago

nice

@makampf Did you reopen the issue because the Docker image with that fix is not yet available? I am curious for your rationale in reopening this one... It would be great if you could explain your motivation a bit!

makampf commented 3 years ago

Eh sorry. Just because the last update stated:

and hence all resource.id except for the patient one are still unstable.

holger-stenzhorn commented 3 years ago

Eh sorry. Just because the last update stated:

and hence all resource.id except for the patient one are still unstable.

Ok, understood! But now you know and can rest assured that a closed issue means a fixed issue indeed. At least that is our hope... 😁 But seriously, next time I will note down when and in which version the fix is available for the actual users to be more clear. For this issue you will get the fix in version '0.2.3' on Monday (at the latest).

makampf commented 3 years ago

Thanks. Looking forward to