feat: Implement ONC 2015 Certification (b)(10) Electronic Health Information Export

adunsulag commented 1 year ago

We need to implement the following ONC 2015 Certification (b)(10) Electronic Health Information Export. https://www.healthit.gov/test-method/electronic-health-information-export

The requirements for this are as follows:

(b)(10)(i)(A) - Enable a user to timely create an export file(s) with all of a single patient’s electronic health information stored at the time of certification by the product, of which the Health IT Module is a part. (b)(10)(i)(B) - A user must be able to execute this capability at any time the user chooses and without subsequent developer assistance to operate. (b)(10)(i)(C) - Limit the ability of users who can create export file(s) in at least one of these two ways: (1) To a specific set of identified users (2) As a system administrative function [Feature should implement option 2]. (b)(10)(i)(D) - The export files(s) created must be electronic and in a computable format. (b)(10)(i)(E) - The publicly accessible hyperlink of the export’s format must be included with the exported file(s). (b)(10)(ii) - Create an export of all the electronic health information that can be stored at the time of certification by product of which the Health IT Module is a part.

For purposes of Electronic Health Information the information exported by this feature should follow the ONC definition given which is: EHI means “electronic protected health information” (ePHI) as defined in 45 CFR 160.103 to the extent that it would be included in a designated record set as defined in 45 CFR 164.501, regardless of whether the group of records are used or maintained by or for a covered entity. But EHI does not include psychotherapy notes as defined in 45 CFR 164.501 or information compiled in reasonable anticipation of, or for use in, a civil, criminal, or administrative action or proceeding.

The regulation text for 45 CFR 160.103 defines "health information" as the following: Health information means any information, including genetic information, whether oral or recorded in any form or medium, that: (1) Is created or received by a health care provider, health plan, public health authority, employer, life insurer, school or university, or health care clearinghouse; and (2) Relates to the past, present, or future physical or mental health or condition of an individual; the provision of health care to an individual; or the past, present, or future payment for the provision of health care to an individual.

The regulation text for 45 CFR 160.103 defines "protected health information" as the following: Protected health information means individually identifiable health information: (1) Except as provided in paragraph (2) of this definition, that is: (i) Transmitted by electronic media; (ii) Maintained in electronic media; or (iii) Transmitted or maintained in any other form or medium. (2) Protected health information excludes individually identifiable health information: (i) In education records covered by the Family Educational Rights and Privacy Act, as amended, 20 U.S.C. 1232g; (ii) In records described at 20 U.S.C. 1232g(a)(4)(B)(iv); (iii) In employment records held by a covered entity in its role as employer; and (iv) Regarding a person who has been deceased for more than 50 years.

As far as I know OpenEMR does not have a way currently of distinguishing notes that are specific to psychotherapy notes and which are to be excluded from the export. Users wanting to keep their psychotherapy notes confidential that are stored inside of OpenEMR will need to use a different mechanism than this export.

OpenEMR does not have a way of marking records as education records.

We will need to make sure that employee records and other confidential information does not inadvertently get sent out as part of this feature. Only administrators should be allowed to execute the export.

adunsulag commented 1 year ago

Relevant PR is here: https://github.com/openemr/openemr/pull/6939

@bradymiller @sjpadgett @stephenwaite I think I have the main components of the data exporter for b10 built. I think the main traversal algorithm appears to be working well. You can export on a single patient, or export on the entire patient population. I've got about 85 of the 120ish tables exporting now with their foreign keys identified by the schemaspy tool. I'll be doing a bunch of testing to make sure that everything is exporting properly.

I do need to add in a mechanism to hold back some data on certain fields for a given table. The rudiments are there, but I need to go in and add in the specific column selection for things like the user table, x12 partners table, etc so that we aren't giving confidential data / employee data out as part of the export.

I also need to change up the save location for the exported zip. Right now the csv and zip files get written out to the module directory. We need to store them similar to how we handle our bulk fhir exports using the Document class so that the files get written out to CouchDB or to the filesystem. My only concern is how to handle a situation where we could potentially have GBs of data in the zip file as part of the patient documents being exported.

That leads to my next piece. I'm looking at adding in a configurable limit to allow chunks of patients to be exported. One chunk could default to 250/500 but let the administrator choose a different number. The system would then create multiple zip files to export.

I do wonder though how to handle large files stored for a patient. If for example a patient has GBs of documents stored, the zip files could be huge. Generating those zip files (and encrypting them if file storage encryption is enabled) could be massive and exceed a zip file's limitations. One idea would be to export the documents one at a time, keeping track of how many patients have been succesfully exported. If we reach the maximum size (1-2GB) for the patient documents we would then break up the zip file at that point even if the administrator chose a higher # of patients to export. As a minimum I think I'd want to say that we won't break a single patient up into multiple zip files, rather we will export an entire patient's documents into one zip and move on to the next one. I believe the documents table actually stores the file sizes so we should be able to do this analysis ahead of time instead of exporting the files as we go.

I'll also do a synthea import of a few hundred/thousand patients ccdas and do some testing that way to make sure things are working.

At that point the rest of the work is just adding documentation comments on all of the main tables and if we have enough time on the individual columns. I'll also need to write an overall breakdown of the zip file on the main EHI export page. Fortunately the schemaspy templates allow us to customize everything.

mdsupport commented 1 year ago

I do need to add in a mechanism to hold back some data on certain fields for a given table.

This as well as several other functions need a data dictionary - preferably with versioning mechanism. For patient archive requirement we used a set of generic developer object/component tables and ADODB meta functions to define function specific mappings. While many requirements appear to be pure data exports, we have found ability to invoke custom/column specific logic helps a lot.

Re. the size limits, is the requirement for patient to use portal to take all their data?

Re. documentation, since the requirement is purely for a public link would a readthedocs or github link with versioning support be an option?

adunsulag commented 1 year ago

@mdsupport This is more a requirement that a clinic wants to take all of their patient data to another EHR. It can be used to satisfy a patient dedicated record-set dump by choosing just a single patient, but I believe the intent is to allow interoperability between EMRs and mitigate vendor lockin with patient data.

For the documentation requirement I'm keeping it versioned via source control in github. The public repo can be used as a link, but also if someone runs a customized install they can regenerate the documentation and have it include their custom fields.

adunsulag commented 1 year ago

@mdsupport I'm curious on your table structure were you using those tables to track a historical record of every change to a database table / column? Or were you using those tables to track database schema changes over time?

mdsupport commented 1 year ago

Not every change but a prep run is needed to digest current structures in obj_json column as provided by ADODB or parsing database.sql. Importer compares the column values to figure if anything changed at table level. Then review and set mapping actions for new records in the table due to add or change to the table structure. Hardest part is the initial configuration but maintenance is minimal effort.

dev_component table holds set of column specfic actions in comp_json column of extraction script record and database table record.

Extraction is done only when prep run results are clear.

adunsulag commented 1 year ago

So the hard limit for a single table that is being exported looks like it is going to be 4GB as that is the most I can store in a LONGBLOB in the database. That works out as the largest amount of data we can stuff into a php zip archive is also 4GB.

I just did some data exports and couldn't figure out why my data was capping out at 64KB, found out that storing the export results as BLOBs isn't quite large enough... for everyone's information 64KB is about 300 encounter forms in a CSV export.

sjpadgett commented 1 year ago

Group export into some categories that make sense. I wouldn't think the dump has to be in one zip or a single entity. But I haven't been following too closely so....

adunsulag commented 1 year ago

@sjpadgett I break it up into multiple dumps by patient based upon the zip file size the user specifies at the time of export. If a patient doesn't have zip files I put a patient down at representing about 100KB per patient export to estimate the zip file sizes.

sjpadgett commented 1 year ago

How about using gzip or other. Gzip I believe is only limited by filesystem requirements

adunsulag commented 1 year ago

@sjpadgett I'm trying to avoid hitting the filesystem and so right now the current approach stays in the database or in memory as I don't want to have deal with the data being unencrypted on the drive. 4GB of data in a single table for 5000 patient records (max # patients per zip file) I think should be fine.

sjpadgett commented 1 year ago

Very nice job @adunsulag I really like the interface. Also

$file =file_get_contents($file) or string;
$gzfile = "ehi.gz";
$fp = gzopen($gzfile, 'w9'); 
gzwrite($fp, $file);
gzclose($fp);
// or am I missing something obvious?

adunsulag commented 1 year ago

@sjpadgett as far as the gzip goes I haven't used the php mechanism before, we could give it a try. We'd still need to read the file into memory and encrypt it but if someone has the server memory for it, it would allow for zip files larger than the 4GB limit as it looks like the gzip algorithm won't die at 4GB from what I read in the php documents.

I think we could try switching to the algorithm after I fix the parameter binding issue. The way I'm querying for the record sets I'm dying with parameterized queries at ~200K bound parameters for the record traversals. Synthea apparently will generate thousands of encounter forms for a single patient so I have 200K encounters for 768 patients.

sjpadgett commented 1 year ago

Does this relate back to a PHP limit being reached or a sequel engine limit?

adunsulag commented 1 year ago

@sjpadgett Don't know if its at the PDO layer, or the MySQL layer.

sjpadgett commented 1 year ago

I think I have an idea and will look into

adunsulag commented 1 year ago

@bradymiller So I discovered that our SQL database max_packet_allowed_size is 16MB so storing the exported zips in the database isn't going to work without changing server configurations which is a mess. I guess I will use the Document class and store each table export that way. It will be slower as the data will end up being encrypted temporarily and then decrypted as we stuff it back into the zip/gzip files but at least it will handle the larger file sizes. I'll clean up the temporary files at the end of the process. What a pain.

adunsulag commented 1 year ago

This was completed in the following PRs https://github.com/openemr/openemr/pull/6958

https://github.com/openemr/openemr/pull/6969

https://github.com/openemr/openemr/pull/6977

https://github.com/openemr/openemr/pull/6985

https://github.com/openemr/openemr/pull/6987

openemr / openemr

feat: Implement ONC 2015 Certification (b)(10) Electronic Health Information Export #6945