microsoft / fhir-server

A service that implements the FHIR standard
MIT License
1.2k stars 515 forks source link

Concurrent conditional creates results in duplicate data #1382

Open CaitlinV39 opened 4 years ago

CaitlinV39 commented 4 years ago

Describe the bug When doing two conditional creates simultaneously (or near simultaneously) for the same resource, both are created, causing duplicate data to be stored in the system

FHIR Version? Stu3/R4

Data provider? CosmosDB/SQL Server

To Reproduce Steps to reproduce the behavior: Following are the steps the customer is following 1 Delete all existing patient data from their instance.

  1. Make concurrent ‘Create Patient’ requests with ‘If-None-Exist:’ with a patient Identifier as a header parameter.

Expected behavior Receive the 412 response for the 2nd create patient request that hits the Azure API for FHIR instance.

Actual behavior 2nd patient is created as a duplicate

CaitlinV39 commented 4 years ago

AB#76902 and AB#76903 to address in SQL and Cosmos

valeneiko commented 3 years ago

I have just ran into the same issue. I can see there was a PR to fix it: #1386. @brendankowitz, what was the reason it has been closed and not merged?

CraigP68 commented 2 years ago

AB#87262

moloneyd-draeger commented 1 year ago

We have noticed the same issue with conditional creates in our environment. A quick python script to recreate the issue is attached - please note this is "flaky" as the server sometimes does only create one resource!

Is there any way of configuring the system or sending a request so that the server will apply a "stronger" uniqueness condition ?

We have also noticed that the issue occurs much more frequently if the server is under more load, and especially when the CosmosDB is reaching the throttled/configured RU limit. Does anyone have any recommended mitigations for this issue?

FHIR Version? R4

Data provider? CosmosDB

Recreate script

# requirements
# pip install requests fhir.resources==6.1.0

# Recreate conditional create issue with Azure API for FHIR.
# tested with
#   python 3.9
#   Azure API for FHIR: fhir-R4, CosmosDB (1000 RUs)

import time
import uuid

from datetime import datetime, timezone
from concurrent.futures import ThreadPoolExecutor
from urllib.parse import urlencode

import requests

from fhir.resources.bundle import Bundle
from fhir.resources.identifier import Identifier
from fhir.resources.patient import Patient

# configure environment and auth
FHIR_SERVER_ADDRESS = ''
AZURE_TENANT_ID = ''
OAUTH_CLIENT_ID = ''
OAUTH_CLIENT_SECRET = ''

# configure request flooding
NUM_SIMULTANEOUS_CREATE_ATTEMPTS = 5
NUM_TOTAL_CREATE_ATTEMPTS = 5

# simple OAuth2 client-credentials flow to get a bearer token for the test lifetime
# replace this with your auth of choice
oauth_response = requests.post(
    url=f'https://login.microsoftonline.com/{AZURE_TENANT_ID}/oauth2/v2.0/token',
    headers={
        'Accept': 'application/json',
        'Content-Type': 'application/x-www-form-urlencoded'
    },
    data={
        'client_id': OAUTH_CLIENT_ID,
        'client_secret': OAUTH_CLIENT_SECRET,
        'grant_type': 'client_credentials',
        'scope':  f'{FHIR_SERVER_ADDRESS}/.default'
    }
)
oauth_response.raise_for_status()
authorization_header = f"Bearer {oauth_response.json()['access_token']}"

resource_to_spam = Patient(
    identifier=[
        Identifier(system="conditional_create_test",
                   value=str(uuid.uuid4()))
    ],
    # some other random properties, not important
    active=True,
    deceasedBoolean=False,
)

def attempt_resource_create(resource):
    request_started_at = datetime.now(timezone.utc)
    response = requests.post(
        url=f"{FHIR_SERVER_ADDRESS}/{resource.resource_type}",
        data=resource.json(),
        headers={
            'Authorization': authorization_header,
            'If-None-Exist': urlencode({'identifier': f'{resource.identifier[0].system}|{resource.identifier[0].value}'}),
            'Content-Type': 'application/fhir+json',
        }
    )
    request_finished_at = datetime.now(timezone.utc)
    return response, request_started_at, request_finished_at

def warm_up_threadpool(threadpool, num_worker_threads):
    # use a sleep task to force the threadpool implementation to pre-spawn all the worker threads
    # this removes the thread creation overhead when checking for race conditions over HTTP
    def _dummy_task():
        time.sleep(1)

    for task in [threadpool.submit(_dummy_task) for _ in range(2 * num_worker_threads)]:
        _ = task.result()

with ThreadPoolExecutor(max_workers=NUM_SIMULTANEOUS_CREATE_ATTEMPTS) as threadpool:
    warm_up_threadpool(threadpool, NUM_SIMULTANEOUS_CREATE_ATTEMPTS)
    attempt_create_tasks = [
        threadpool.submit(attempt_resource_create, resource_to_spam)
        for _ in range(NUM_TOTAL_CREATE_ATTEMPTS)
    ]

# give the server a little break to sync before checking result
# this should not be required, but given this script is recreating a race condition ... let's be safe
time.sleep(2)

search_response = requests.get(
    url=f"{FHIR_SERVER_ADDRESS}/{resource_to_spam.resource_type}",
    params={'identifier': f'{resource_to_spam.identifier[0].system}|{resource_to_spam.identifier[0].value}'},
    headers={
        'Authorization': authorization_header,
        'Accept': 'application/json',
    }
)
search_response_bundle = Bundle.parse_raw(search_response.text)
print(f"Attempted create {NUM_TOTAL_CREATE_ATTEMPTS} times, created {len(search_response_bundle.entry or [])} resources.")
print(f"Resource ids:")
for entry in search_response_bundle.entry or []:
    print(f"{entry.resource.resource_type}/{entry.resource.id} created at {entry.resource.meta.lastUpdated}")

print("Request Summary")
for task in attempt_create_tasks:
    response, request_started_at, request_finished_at = task.result()
    print(f"Response {response.status_code} between {request_started_at} and {request_finished_at}")

Example output:

Attempted create 5 times, created 4 resources.
Resource ids:
Patient/b3dad015-cffc-449e-aaed-3359e9f389ee created at 2023-06-23 06:53:23.859000+00:00
Patient/2b174187-b446-44db-8a8c-01181214b2cb created at 2023-06-23 06:53:23.860000+00:00
Patient/410a5cc7-d16b-4ee9-ba59-7f20399df623 created at 2023-06-23 06:53:23.861000+00:00
Patient/b3e7ea02-b536-434b-a6c2-f172e33253c5 created at 2023-06-23 06:53:23.861000+00:00
Request Summary
Response 201 between 2023-06-23 06:53:30.451094+00:00 and 2023-06-23 06:53:30.621145+00:00
Response 201 between 2023-06-23 06:53:30.451094+00:00 and 2023-06-23 06:53:30.633846+00:00
Response 200 between 2023-06-23 06:53:30.452072+00:00 and 2023-06-23 06:53:30.622115+00:00
Response 201 between 2023-06-23 06:53:30.452072+00:00 and 2023-06-23 06:53:30.626023+00:00
Response 201 between 2023-06-23 06:53:30.452072+00:00 and 2023-06-23 06:53:30.616253+00:00
radupurdea commented 1 month ago

This is still an issue for us, could this be looked at, please?

EXPEkesheth commented 1 month ago

Thanks for raising the issue and we will look into the same. 549359380