seattleflu / id3c-customizations

Extensions of ID3C for the Seattle Flu Study
MIT License
3 stars 0 forks source link

Add extra column to `receiving.fhir` for "processed-by" #20

Open kairstenfay opened 4 years ago

kairstenfay commented 4 years ago

I'm currently debugging the ETL FHIR command in ID3C using the output of etl redcap-det swab-n-send and etl redcap-det kiosk, both part of the redcap-etl branch of this repo.

Debugging FHIR documents stored in receiving.fhir will be much easier if we know which ETL process generated them.

Using encounter site alone is not sufficient, because the specific bug I'm working through is where, in the FHIR document, encounter site is null.

tsibley commented 4 years ago

Yes, this metadata would be good to track! Two solutions come to mind:

  1. Add a submitter or agent column to receiving.fhir which is set to a User-Agent-like string. In the case of our command-line REDCap processors, it could be strings like id3c etl redcap-det {etl_name}/{revision}. In the case of the web API /v1/receiving/fhir, it could be the actual User-Agent sent by the client. (We'd then tell producers to set a valid and meaningful User-Agent.)

  2. Add nothing to receiving.fhir, but update our command-line REDCap processors to include a Provenance resource in the Bundle and encourage web API submitters to do the same. We wouldn't process this resource in id3c etl fhir, but it'd be there for debugging. As a lighter weight alternative to Provenance resource(s), the meta.source metadata field could be provided on the Bundle.

kairstenfay commented 4 years ago

After reading your proposals, I was originally leaning toward # 2 for the sake of consistency across all our receiving tables, but I see now that some tables, like receiving.sequence_read_set_sequence_read_set_id_seq, have extra columns besides id, document, and received and processing_log.

The benefit to # 2 is also a simpler uploading process (by requiring fewer columns) with the trade-off that it would be more difficult to enforce that a user uploads a document with a Provenance or Meta resource.

Re: Meta -- the meta.source field takes a URI. Are we okay with constructing one, like http://seattleflu.org/etl/redcap-det-swab-n-send?

Provenance does feel like a bit too heavy weight of a solution for the time being. Thanks for looking into these.

tsibley commented 4 years ago

Nod. I don't think all receiving tables need to have the exact same schema, although broad consistency among them is useful.

Following up on conversation in person, it's worth pointing out that although I presented the two options as exclusive, they're not. We can (and maybe should) record submitter/agent in receiving.fhir as well as provide more details on the data provenance itself in the Bundle.