Open bashir2 opened 3 months ago
@tadgh if you know how to route this ticket please let us know. Many thanks!
Huh interesting one. I'll read over the other thread and get back to you. Thanks Jing!
Thanks @tadgh for looking into this; I was wondering if there is any update. Please let me know if you need more inputs for reproducing this problem.
Sorry, we have been slammed with the release last couple weeks. Once that settles down I can get some focus on this
Hey @bashir2 and @jingtang10 is there any chance this can be replicated in a HAPI-FHIR test? IF you can submit a PR with a failing test that would go a long way for us in terms of our ability to address this.
@tadgh can you please point me to an example/similar PR that you want me to create for reproducing this issue? The reproducing steps require a large number of resources to be uploaded to the HAPI server (as mentioned in my OP of this issue). Can I reproduce such an environment in the HAPI-FHIR test that you are suggesting?
Most certainly. Have a look at StressTesR4Test which generates ~1k resources.
Alternatively, and maybe better for your use case, is FhirResourceDaoR4SearchOptimizedTest which seems to be doing roughly what you are. You'll note that that second test creates 200 patients, and queries them in parallel via a thread pool. However, that test interacts directly against the DAOs so that may hide the source of failure if its upstream in the client. Wouldn't be much of a lift to use the igenericclient in a threadpool though.
Let me know if you need further guidance.
Thanks @tadgh for the pointer; it took me a while to get to this and then to set up my environment but I think I have demonstrated the issue in this commit. Looking a little bit into the details, I think I understand the root cause of this now as well.
TL;DR;
I think this is due to SearchPreFetchThresholds
which is by default set to (13, 503, 2003, -1)
here. Once we hit around 2000 resources, we have a slow fetch which is consistent with my initial findings.
Details:
The test I have added, sets SearchPreFetchThresholds
to (70, 300, -1)
. It creates 10K Observation resources and then fetch them in pages of size 30. When we are fetching the 300 to 329 resources (and to a lesser extent at 60) we have a slow request. On my machine it takes over 10 seconds while other page fetches are ~10-25 ms (except 60).
I have demonstrated this in the WARN messages I have added to the end of the test. Here are some relevant log messages for 3 consecutive page fetches; note the WARN messages for _getpagesoffset=300
:
2024-09-25 05:36:06.501 [main] INFO c.u.f.j.stresstest.StressTestR4Test [StressTestR4Test.java:259] Loading page 9: http://localhost:41149/fhir/context?_getpages=89d1e925-5774-49e3-8124-9fe456b71fd6&_getpagesoffset=270&_count=30&_bundletype=searchset
2024-09-25 05:36:06.501 [main] INFO c.u.f.r.c.i.LoggingInterceptor [LoggingInterceptor.java:82] Client request: GET http://localhost:41149/fhir/context?_getpages=89d1e925-5774-49e3-8124-9fe456b71fd6&_getpagesoffset=270&_count=30&_bundletype=searchset HTTP/1.1
2024-09-25 05:36:06.522 [main] INFO c.u.f.r.c.i.LoggingInterceptor [LoggingInterceptor.java:127] Client response: HTTP 200 OK (Bundle/89d1e925-5774-49e3-8124-9fe456b71fd6) in 21ms
2024-09-25 05:36:06.524 [main] INFO c.u.f.j.stresstest.StressTestR4Test [StressTestR4Test.java:259] Loading page 10: http://localhost:41149/fhir/context?_getpages=89d1e925-5774-49e3-8124-9fe456b71fd6&_getpagesoffset=300&_count=30&_bundletype=searchset
2024-09-25 05:36:06.525 [main] INFO c.u.f.r.c.i.LoggingInterceptor [LoggingInterceptor.java:82] Client request: GET http://localhost:41149/fhir/context?_getpages=89d1e925-5774-49e3-8124-9fe456b71fd6&_getpagesoffset=300&_count=30&_bundletype=searchset HTTP/1.1
2024-09-25 05:36:17.765 [main] INFO c.u.f.r.c.i.LoggingInterceptor [LoggingInterceptor.java:127] Client response: HTTP 200 OK (Bundle/89d1e925-5774-49e3-8124-9fe456b71fd6) in 00:00:11.239
2024-09-25 05:36:17.767 [main] WARN c.u.f.j.stresstest.StressTestR4Test [StressTestR4Test.java:263] Loading page 10 at index 300 took too long: 00:00:11.243
2024-09-25 05:36:17.768 [main] INFO c.u.f.j.stresstest.StressTestR4Test [StressTestR4Test.java:259] Loading page 11: http://localhost:41149/fhir/context?_getpages=89d1e925-5774-49e3-8124-9fe456b71fd6&_getpagesoffset=330&_count=30&_bundletype=searchset
2024-09-25 05:36:17.768 [main] INFO c.u.f.r.c.i.LoggingInterceptor [LoggingInterceptor.java:82] Client request: GET http://localhost:41149/fhir/context?_getpages=89d1e925-5774-49e3-8124-9fe456b71fd6&_getpagesoffset=330&_count=30&_bundletype=searchset HTTP/1.1
2024-09-25 05:36:17.788 [main] INFO c.u.f.r.c.i.LoggingInterceptor [LoggingInterceptor.java:127] Client response: HTTP 200 OK (Bundle/89d1e925-5774-49e3-8124-9fe456b71fd6) in 20ms
So I am not sure if there is actually anything to be fixed here, is there? I was looking at the hapi-fhir-jpaserver-starter
code to see if there is a config parameter to change the default SearchPreFetchThresholds
such that at least we document this behavior on our side; but from here it seems to me that this parameter is not exposed among the configs; is my understanding correct?
Yes adding that config to the jpaserver-starter helps; I am also trying to find other ways we can account for this (maybe from our pipeline's end).
What does the pre-fetching exactly mean? I remember that when doing a search, HAPI was storing the list of IDs for that particular search in DB. Does pre-fetching means creating that full-list? Is it possible to make creation of that list more gradual? I mean something similar to the idea of SearchPreFetchThresholds
list but instead of having one last very large chunk, make it gradual, i.e., pre-fetch in batches of a configurable number. So for example, the last number of that list, instead of -1
, would be that batch size (e.g., 10000
).
If this is not easy to do, I think from our side, we should initially fetch a page that is beyond the last index (e.g., 3000
) to make sure everything is pre-fetched, then flood the server with parallel page queries.
I'd rather defer to @michaelabuckley 's thoughts on this, but I can certainly help in exposing the setting.
Could you not just use -1
as the first and only element in the list? That would force a full result list immediately, but may blow the memory on the server, depending on your use case.
Could you not just use
-1
as the first and only element in the list? That would force a full result list immediately, but may blow the memory on the server, depending on your use case.
The problem is that we do not control the source FHIR-server. Our pipeline is supposed to read from any FHIR-server. We have been facing this issue when our pipeline uses the FHIR Search API and the FHIR-server is HAPI. We can recommend that setting to our partners once your PR is released, but we should also be clear about other performance implications of that.
BTW, what are the memory implications of pre-fetching everything, i.e., just use -1
as you suggested?
On a large enough dataset? Not good! I didn't write this code and am not intimately familiar with it but my read on it is that it would prefetch all possible results, which could exceed memory limitations of the DB or the application server. For your use case, it may be better to fetch a known static amount per fetch, then have your pipeline adjust so that it runs like this:
Just spitballing here on the options, no clue if this would be suitable for your particular use case.
Another option would be to somehow make this prefetch configurable on a per-original-request basis, but that obviously opens up the server to DOS vulnerabilities if it's used haphazardly.
NOTE: Before filing a ticket, please see the following URL: https://github.com/hapifhir/hapi-fhir/wiki/Getting-Help
Describe the bug TL;DR: With recent versions of HAPI JPA server (not sure exactly since when) we cannot fetch pages of a search result in parallel, if the number of resources to be fetched is too large.
Details: In fhir-data-pipes we have different methods for fetching resources from a FHIR server in parallel. One of these is through FHIR Search API; one search is done and then different pages are downloaded in parallel, i.e., many queries like this:
This used to work fine with HAPI FHIR JPA servers too, at least until two years ago as described in this issue. But we have recently discovered that if the number of resources to be fetched is large, then accessing the pages in parallel will fail with
HAPI-1163: Request timed out after 60492ms
(the time is around 60 seconds and I think comes from this line). Doing some investigations, it seems after ~2K resources are fetched, one page request, all of sudden, takes tens of seconds (other requests are instant). I have investigated this on our side and have reproduced this problem with a simple script (outside our pipeline code), as described in the same issue. If I run the pipeline with a single worker, it eventually succeeds. But having 3 or more workers, usually fails (in my experiments/setup, it failed when number of resources was more than ~5 million). I am guessing that very slow page request, blocks other parallel requests (but I am not sure).Another input: Although I have set the number of DB connections of HAPI JPA server to 40, there is usually only one or maybe two
postgres
processes doing some work. When we did extensive performance experiments two years ago, we could easily flood 40+ CPU cores withpostgres
processes.To Reproduce Steps to reproduce the behavior:
Expected behavior The parallel fetch can flood all resources on the machine and succeed.
Screenshots See this issue for when we profiled HAPI JPA server two years with similar setup.
Environment (please complete the following information):
Additional context Here is a sample stack-trace from the HAPI JPA docker image logs: