hapifhir / hapi-fhir

🔥 HAPI FHIR - Java API for HL7 FHIR Clients and Servers
http://hapifhir.io
Apache License 2.0
2.05k stars 1.33k forks source link

Discrepancies in FHIR Bundle Entries Under Heavy Concurrent Load #6308

Open AleandroDs opened 2 months ago

AleandroDs commented 2 months ago

Description

When performing multiple parallel FHIR queries (e.g., search operations) that are initiated around the same time, the number of entries returned in the response bundles is inconsistent across queries. The issue is more prevalent under heavy parallelized load, where queries executed nearly simultaneously return different numbers of entries, despite the expected result being the same.

To Reproduce
Steps to reproduce the behavior:

  1. Execute multiple identical FHIR queries in parallel.
  2. Ensure the queries are initiated at almost the same time (e.g., using multiple threads or processes).
  3. Check the number of entries returned in the resulting bundles.
  4. Observe that the number of entries is inconsistent across the responses. (This is not always the case it is hard to reproduce)
  5. I requested all Patient resources in pages of 100 records each, using a script to continuously request data until the next URL is no longer provided in the response. This process was automated to run in parallel across multiple threads, each executing identical requests concurrently.

Expected behavior
I expect that all identical FHIR queries submitted simultaneously return the same number of records in their response bundles, regardless of parallelization or concurrency.

Screenshots
Below is the result showing the result of the query executed on the database:

SELECT * FROM public.hfj_search ORDER BY pid;
pid created search_deleted expiry_or_null failure_code failure_message last_updated_high last_updated_low num_blocked num_found preferred_page_size resource_id resource_type search_param_map search_query_string search_query_string_hash search_type search_status total_count search_uuid optlock_version search_query_string_vc search_param_map_bin
12642 2024-09-24 11:33:11.082 False NULL NULL NULL NULL NULL 0 2772 100 NULL Patient NULL NULL -1676697053 1 FINISHED 2772 1f9d8280-a643-4652-8b9d-3c45994524f0 7 ?_count=100 binary data
12643 2024-09-24 11:33:11.083 False NULL NULL NULL NULL NULL 0 2358 100 NULL Patient NULL NULL -1676697053 1 FINISHED 2358 66176b7f-6e1b-4270-bc55-1949db287367 7 ?_count=100 binary data
12644 2024-09-24 11:33:11.084 False NULL NULL NULL NULL NULL 0 2358 100 NULL Patient NULL NULL -1676697053 1 FINISHED 2358 8c1d5ed5-5d0d-4ced-9027-c2124559b745 7 ?_count=100 binary data
12645 2024-09-24 11:33:11.085 False NULL NULL NULL NULL NULL 0 2403 100 NULL Patient NULL NULL -1676697053 1 FINISHED 2403 aaa55dfb-a9f4-4f63-9ee3-259f072ad6ea 7 ?_count=100 binary data
12646 2024-09-24 11:33:11.082 False NULL NULL NULL NULL NULL 0 2772 100 NULL Patient NULL NULL -1676697053 1 FINISHED 2772 7f769ab5-d115-4e5b-9f22-5649deb20dab 7 ?_count=100 binary data
12647 2024-09-24 11:33:11.085 False NULL NULL NULL NULL NULL 0 2772 100 NULL Patient NULL NULL -1676697053 1 FINISHED 2772 b1aa728d-e88d-4353-a0e7-f1abc790fbb1 7 ?_count=100 binary data
12648 2024-09-24 11:33:11.082 False NULL NULL NULL NULL NULL 0 2772 100 NULL Patient NULL NULL -1676697053 1 FINISHED 2772 cd4207f2-b31b-42fe-8b4f-37d4a8a8e482 7 ?_count=100 binary data

You can see that it is the same query in column search_query_string_vc but the num_found is not always 2772, which is the real amount.

Environment

HAPI FHIR Version: 7.2.1 Database: Postgres 16.3 OS: Debian

Additional context This issue might be related to internal concurrency handling within HAPI FHIR when under heavy load. It’s important to note that the issue occurs primarily in parallel execution scenarios.