Open mingward opened 3 days ago
Interesting. Do you have any suggestions on what a good fix would be for this issue?
I suppose the simplest solution would be to forbid _has
searches when the authorization interceptor is in use at all. A slightly more refined option might be for it to ban them when any kind of compartment based rule is in place..
Thinking this through a bit more... I guess what we should really be doing is blocking _has
if there isn't a rule which allows "global" access to all resources of the source resource type for the _has
expression.
Thanks. But reverse chaining searching is a great feature. Our tech lead has a few suggested fix.
one is automatically adding search parameter variants that access the security label for every search parameter (e.g., if there is a search parameter code, then there is automatically a security-code parameter, and code-value-quantity implies security-code-value-quantity) and then a SearchNarrowingInterceptor that automatically replaces normal search parameters with their security- variants using a specified list of security codes and replaces the security- variants with one whose security parameter target value is the intersection of that security code list and the ones the user specified.
He also described another fix: (more flexible but more challenging for the user to debug) is to have a SearchRewritingInterceptor that takes a parsed search and returns a different parsed search. The returned search contains the original search so that it can be displayed to the user (thus, the rewriting remains invisible). But the actual query executed will be the rewritten one. (This might even be a QueryRewritingInterceptor that allows rewriting other FHIR queries like PUT and DELETE.) A user could then use this to write the above SearchNarrowingInterceptor or something less general that met their specific security needs.
Of the two fixes mentioned above, the favor is the security search narrowing interceptor. It will solve most of the problems faced by most users. It will also share code that can be polished along with the rest of the server and improve over time.
Using SearchNarrowingInterceptor would certainly be an avenue.. I haven't thought it through completely but at first blush I suspect this would mitigate the attack you're describing. I'm not sure I follow the QueryRewritingInterceptor part - You should be able to achieve what you're describing using the stock SearchNarrowingInterceptor today, assuming your security rules are restricting the user to accessing a single patient compartment.
Describe the bug The HAPI FHIR server AuthorizationInterceptor only limits the query results, not the resources that can be searched. Consequently, a user can craft queries to deduce the values in resources they should not have access to. This presents data leak threats to protected data.
To Reproduce Here is an example demonstrating the "data leak". Assume that on a FHIR server, we have two resources: a) Resource "Patient" has the patient ID twice de-identified. For demonstration purposes, we make this resource public so you can try it. The example URL is: https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-small/x1/Patient?&_count=1
b) Resource "Observation" requires authorized access. So, https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-small/x1/Observation shows a "denied access" message.
c) Next, we will show that even without authorization to the "Observation" resource, you can still use a set of queries to find out a patient's values in the "Observation" resource for a measurement called "phv00493247.v1.p1".
Step 1: Run the following query: https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-small/x1/Patient?_has:Observation:patient?_has:Observation:patient:code-value-quantity=phv00493247.v1.p1$gt3
You will get 4 patients below. Using binary search to eventually deduce the measure of a patient in the series of query below.
| Query | Patients | What can be deduced |
| phv00493247.v1.p1$gt3 | 4 patients: | | | | ["3778609", "3778569", | | | | "3778606", "3778571"] | |
| phv00493247.v1.p1$gt4 | 0 | All 4 patients' value is between (3, 4) |
| phv00493247.v1.p1$gt3.5 | ["3778571"] | Patient 3778571 has value between (3.5, 4). | | | | Other 3 patients have value in (3, 3.5]. | | | | Continue with binary search. |
| phv00493247.v1.p1$gt3.75 | 0 | Patient 3778571 value in (3.5, 3.75] |
| phv00493247.v1.p1$gt3.625 | 0 | Patient 3778571 value in (3.5, 3.625] |
| phv00493247.v1.p1$gt3.5625 | ["3778571"] | Patient 3778571 value in (3.5625, 3.625] |
| phv00493247.v1.p1$gt3.59375| 0 | Patient 3778571 value in (3.5625, 3.59375] |
| phv00493247.v1.p1$gt3.578125| ["3778571"] | Patient 3778571 value in (3.578125, 3.59375] |
| phv00493247.v1.p1$gt3.5859375| 0 | Patient 3778571 value in (3.578125, 3.5859375] |
| phv00493247.v1.p1$eq3.58 | ["3778571"] | Patient 3778571 value is 3.58 |
Expected behavior For users with no authorization to Observation, the endpoint should give an error message.
Environment (please complete the following information):
Additional context In above demonstration, we assumed patient is public. But the same data leak exists if a user has authorization to a set of patients and a set of Observation belong to StudyA. This user has no authorization to Observation entries belong to StudyB. But with current AuthorizationInterceptor, the user can deduce the values belong to StudyB.
To use FHIR for research data including "controlled access data", it is of utter importance to prevent such data leak. Thank you in advance for your attention to this.