lantanagroup / XmlHarvester

Converts multiple XML documents into a MDB (Microsoft Access) database whose structure is defined by config.
Apache License 2.0
7 stars 2 forks source link

Need config file for C-CDAs #4

Open TronActive opened 1 year ago

TronActive commented 1 year ago

Can we add a config file for C-CDAs please? Since you have config files for both standards of eICR and QRDA, it would be nice also to have C-CDAs as well.

seanmcilvenna commented 1 year ago

I agree it would be great to have a configuration for c-cda. However, as you can guess, it takes a fair bit of work to create a configuration for such a large standard. At this time, we don't have a need to build one ourself. If you end up creating (or starting one), we would welcome a PR.

TronActive commented 1 year ago

Thank you, and I understand. Since you seem very knowledgeable about this topic. Do you have any resources I can go to help build a config file?

seanmcilvenna commented 1 year ago

Do you have a good understanding of XML and XPATH? The config file is just a matter of creating XPATH that selects the information you want from the C-CDA files, and putting that in a config file that says which table/column the extracted data should be stored in. The config assumes that there's always a top-level table (the example in the README calls it "document"), and you can define child tables in relation to the top-level table. All <group> (aka: table) elements can contain other <group> elements. If you have a solid understanding of XML and XPATH, you should be able to build the config file using the already-provided example configs. My process would be something along the lines of:

  1. Identify which entry templates I want to extract data from in C-CDA
  2. Find an example file that has data for those templates
  3. Create one <group> per entry template that selects (for example) //cda:observation[cda:templateId/@root='XXX']
  4. Create columns based on the variable data in the template (for example cda:value/@value and another column for cda:effectiveTime/@value). The template definition should primarily drive the creation of these columns
  5. Test the config using the example file (easiest is with MS Access, in my opinion)

Hope this helps!

TronActive commented 1 year ago

Thank you for the information. Yes, I am very familiar with XML and XPATH. I just need to learn more about C-CDAs as I am new to them. I was trying to find the difference between eICR and C-CDA so I could figure out what sections cross over. Then I can either remove or fill in what needs to be added. I was hoping maybe you knew where I could find that information. I have been using the HL7.org website but there is no comparisons between the two. That is where I am hitting a roadblock.

seanmcilvenna commented 1 year ago

I'm not sure. Perhaps @dadmbc or @minigrrl have an idea...

dadmbc commented 1 year ago

Can you clarify which versions of eICR and C-CDA you’re seeking to compare?

TronActive commented 1 year ago

@dadmbc I am not sure to be honest. I know nothing about eICR. As for C-CDA I want to have as many sections as possible. I want to try and cover everything in as many C-CDAs as possible.

dadmbc commented 1 year ago

On HL7.org > Standards > Standards-based Product Grid, you can search for both eICR and C-CDA Implementation Guides by searching for 'C-CDA' and 'eICR'. Note the different versions and releases the are returned. Also note that case reporting was specified via two complementary standards/IGs: 'eICR' was the submission from data source to jurisdictional recipient and 'RR' (Reportability Response) IG defined the response back to the data source; together these make up the bi-directional 'eCR' or 'e-Case-Reporting).

In any CDA IG (both C-CDA and eICR), volume 2 is where you'll see the detailed template rules/constraints at the level of detail I believe you're looking for a meaningful comparison. IG packages also come with a sample .xml file which show a basic/common instance of data conforming to the IG, but note there are many many variations allowable (ie conformant) to the constraints specified in the IG (vol 2).

So, the IG will specify the 'minimally-stringent' rules for the data, but in all reality, you'll want to expect that to create a Harvester config (XPath) per each unique data source/implementation you're wanting to parsing into a db. If you have 3 C-CDA sources and 3 eICR sources, you will need 6 distinct configs if no two data sources are identical in structure, each one specifically mapping out the XML elements and attributes you want to parse. Many of the lines in the config will be able to be re-used across configs.

seanmcilvenna commented 1 year ago

One thing you could consider/explore doing, is to use Trifolia Work Bench to export all the templates/constraints as an "Native XML" format, and then use a transform to convert the templates/constraints into an XmlHarvester config file. I may take an initial stab at that if I have some time today or this week... It wouldn't be a perfect/ready-to-use config file, but it may do a lot of the grunt work for you.

seanmcilvenna commented 1 year ago

Ok. I took a stab at starting this. Made a commit that includes an export of C-CDA from Trifolia, an XSLT that creates a basic layout of the XML Harvester config with tables for each section, and a stab at figuring out what the table's columns should be. It's far from finished.. but, maybe it will inspire someone to take it further. :)

Here is the commit: https://github.com/lantanagroup/XmlHarvester/commit/63aea21ae1244392a2570641997231f6f188e00c

The files are: