Requests time out when fetching pages of a search request in parallel ("HAPI-1163: Request timed out")

bashir2 commented 3 months ago

NOTE: Before filing a ticket, please see the following URL: https://github.com/hapifhir/hapi-fhir/wiki/Getting-Help

Describe the bug TL;DR: With recent versions of HAPI JPA server (not sure exactly since when) we cannot fetch pages of a search result in parallel, if the number of resources to be fetched is too large.

Details: In fhir-data-pipes we have different methods for fetching resources from a FHIR server in parallel. One of these is through FHIR Search API; one search is done and then different pages are downloaded in parallel, i.e., many queries like this:

http://localhost:8094/fhir?_getpages=ID&_getpagesoffset=INDEX&_count=10&_summary=data

This used to work fine with HAPI FHIR JPA servers too, at least until two years ago as described in this issue. But we have recently discovered that if the number of resources to be fetched is large, then accessing the pages in parallel will fail with HAPI-1163: Request timed out after 60492ms (the time is around 60 seconds and I think comes from this line). Doing some investigations, it seems after ~2K resources are fetched, one page request, all of sudden, takes tens of seconds (other requests are instant). I have investigated this on our side and have reproduced this problem with a simple script (outside our pipeline code), as described in the same issue. If I run the pipeline with a single worker, it eventually succeeds. But having 3 or more workers, usually fails (in my experiments/setup, it failed when number of resources was more than ~5 million). I am guessing that very slow page request, blocks other parallel requests (but I am not sure).

Another input: Although I have set the number of DB connections of HAPI JPA server to 40, there is usually only one or maybe two postgres processes doing some work. When we did extensive performance experiments two years ago, we could easily flood 40+ CPU cores with postgres processes.

To Reproduce Steps to reproduce the behavior:

Bring up a HAPI JPA server.
Upload a large number of resources (millions) of one type.
Do a single search for fetching all those resources and then fetch pages in parallel.

Expected behavior The parallel fetch can flood all resources on the machine and succeed.

Screenshots See this issue for when we profiled HAPI JPA server two years with similar setup.

Environment (please complete the following information):

HAPI FHIR Version: I have reproduced this with HAPI JPA docker versions 7.2.0 and 6.1.0 (not sure which HAPI version they use).
OS: Linux
Browser: It is not through a browser but a parallel pipeline.

Additional context Here is a sample stack-trace from the HAPI JPA docker image logs:

2024-07-14 01:22:48.242 [http-nio-8080-exec-568] ERROR c.u.f.j.s.SearchCoordinatorSvcImpl [SearchCoordinatorSvcImpl.java:282] Search 109 of type SEARCH for Observation?_count=10&_summary=data&_total=accurate timed out after 60492ms
2024-07-14 01:22:48.242 [http-nio-8080-exec-568] ERROR c.u.f.r.s.i.ExceptionHandlingInterceptor [ExceptionHandlingInterceptor.java:198] Failure during REST processing
ca.uhn.fhir.rest.server.exceptions.InternalErrorException: HAPI-1163: Request timed out after 60492ms
        at ca.uhn.fhir.jpa.search.SearchCoordinatorSvcImpl.getResources(SearchCoordinatorSvcImpl.java:289)
        at ca.uhn.fhir.jpa.search.PersistedJpaBundleProvider.doSearchOrEverything(PersistedJpaBundleProvider.java:265)
        at ca.uhn.fhir.jpa.search.PersistedJpaBundleProvider.getResources(PersistedJpaBundleProvider.java:393)
        at ca.uhn.fhir.rest.server.method.ResponseBundleBuilder.pagingBuildResourceList(ResponseBundleBuilder.java:163)
        at ca.uhn.fhir.rest.server.method.ResponseBundleBuilder.buildResponsePage(ResponseBundleBuilder.java:115)
        at ca.uhn.fhir.rest.server.method.ResponseBundleBuilder.buildResponseBundle(ResponseBundleBuilder.java:57)
        at ca.uhn.fhir.rest.server.method.PageMethodBinding.handlePagingRequest(PageMethodBinding.java:99)
        at ca.uhn.fhir.rest.server.method.PageMethodBinding.doInvokeServer(PageMethodBinding.java:82)
        at ca.uhn.fhir.rest.server.method.BaseResourceReturningMethodBinding.invokeServer(BaseResourceReturningMethodBinding.java:275)
        at ca.uhn.fhir.rest.server.RestfulServer.handleRequest(RestfulServer.java:1198)
        at ca.uhn.fhir.rest.server.RestfulServer.doGet(RestfulServer.java:424)
        at ca.uhn.fhir.rest.server.RestfulServer.service(RestfulServer.java:1938)
        at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:658)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:205)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
        at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
        at org.springframework.web.filter.ServerHttpObservationFilter.doFilterInternal(ServerHttpObservationFilter.java:109)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
        at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:167)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:90)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:482)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:115)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:93)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:340)
        at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:391)
        at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63)
        at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:896)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1744)
        at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52)
        at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191)
        at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.base/java.lang.Thread.run(Thread.java:840)

jingtang10 commented 3 months ago

@tadgh if you know how to route this ticket please let us know. Many thanks!

tadgh commented 3 months ago

Huh interesting one. I'll read over the other thread and get back to you. Thanks Jing!

bashir2 commented 2 months ago

Thanks @tadgh for looking into this; I was wondering if there is any update. Please let me know if you need more inputs for reproducing this problem.

tadgh commented 2 months ago

Sorry, we have been slammed with the release last couple weeks. Once that settles down I can get some focus on this

tadgh commented 2 months ago

Hey @bashir2 and @jingtang10 is there any chance this can be replicated in a HAPI-FHIR test? IF you can submit a PR with a failing test that would go a long way for us in terms of our ability to address this.

bashir2 commented 2 months ago

@tadgh can you please point me to an example/similar PR that you want me to create for reproducing this issue? The reproducing steps require a large number of resources to be uploaded to the HAPI server (as mentioned in my OP of this issue). Can I reproduce such an environment in the HAPI-FHIR test that you are suggesting?

tadgh commented 2 months ago

Most certainly. Have a look at StressTesR4Test which generates ~1k resources.

Alternatively, and maybe better for your use case, is FhirResourceDaoR4SearchOptimizedTest which seems to be doing roughly what you are. You'll note that that second test creates 200 patients, and queries them in parallel via a thread pool. However, that test interacts directly against the DAOs so that may hide the source of failure if its upstream in the client. Wouldn't be much of a lift to use the igenericclient in a threadpool though.

Let me know if you need further guidance.

bashir2 commented 1 month ago

Thanks @tadgh for the pointer; it took me a while to get to this and then to set up my environment but I think I have demonstrated the issue in this commit. Looking a little bit into the details, I think I understand the root cause of this now as well.

TL;DR; I think this is due to SearchPreFetchThresholds which is by default set to (13, 503, 2003, -1) here. Once we hit around 2000 resources, we have a slow fetch which is consistent with my initial findings.

Details: The test I have added, sets SearchPreFetchThresholds to (70, 300, -1). It creates 10K Observation resources and then fetch them in pages of size 30. When we are fetching the 300 to 329 resources (and to a lesser extent at 60) we have a slow request. On my machine it takes over 10 seconds while other page fetches are ~10-25 ms (except 60).

I have demonstrated this in the WARN messages I have added to the end of the test. Here are some relevant log messages for 3 consecutive page fetches; note the WARN messages for _getpagesoffset=300:

2024-09-25 05:36:06.501 [main] INFO  c.u.f.j.stresstest.StressTestR4Test [StressTestR4Test.java:259] Loading page 9: http://localhost:41149/fhir/context?_getpages=89d1e925-5774-49e3-8124-9fe456b71fd6&_getpagesoffset=270&_count=30&_bundletype=searchset
2024-09-25 05:36:06.501 [main] INFO  c.u.f.r.c.i.LoggingInterceptor [LoggingInterceptor.java:82] Client request: GET http://localhost:41149/fhir/context?_getpages=89d1e925-5774-49e3-8124-9fe456b71fd6&_getpagesoffset=270&_count=30&_bundletype=searchset HTTP/1.1
2024-09-25 05:36:06.522 [main] INFO  c.u.f.r.c.i.LoggingInterceptor [LoggingInterceptor.java:127] Client response: HTTP 200 OK (Bundle/89d1e925-5774-49e3-8124-9fe456b71fd6) in 21ms
2024-09-25 05:36:06.524 [main] INFO  c.u.f.j.stresstest.StressTestR4Test [StressTestR4Test.java:259] Loading page 10: http://localhost:41149/fhir/context?_getpages=89d1e925-5774-49e3-8124-9fe456b71fd6&_getpagesoffset=300&_count=30&_bundletype=searchset
2024-09-25 05:36:06.525 [main] INFO  c.u.f.r.c.i.LoggingInterceptor [LoggingInterceptor.java:82] Client request: GET http://localhost:41149/fhir/context?_getpages=89d1e925-5774-49e3-8124-9fe456b71fd6&_getpagesoffset=300&_count=30&_bundletype=searchset HTTP/1.1
2024-09-25 05:36:17.765 [main] INFO  c.u.f.r.c.i.LoggingInterceptor [LoggingInterceptor.java:127] Client response: HTTP 200 OK (Bundle/89d1e925-5774-49e3-8124-9fe456b71fd6) in 00:00:11.239
2024-09-25 05:36:17.767 [main] WARN  c.u.f.j.stresstest.StressTestR4Test [StressTestR4Test.java:263] Loading page 10 at index 300 took too long: 00:00:11.243
2024-09-25 05:36:17.768 [main] INFO  c.u.f.j.stresstest.StressTestR4Test [StressTestR4Test.java:259] Loading page 11: http://localhost:41149/fhir/context?_getpages=89d1e925-5774-49e3-8124-9fe456b71fd6&_getpagesoffset=330&_count=30&_bundletype=searchset
2024-09-25 05:36:17.768 [main] INFO  c.u.f.r.c.i.LoggingInterceptor [LoggingInterceptor.java:82] Client request: GET http://localhost:41149/fhir/context?_getpages=89d1e925-5774-49e3-8124-9fe456b71fd6&_getpagesoffset=330&_count=30&_bundletype=searchset HTTP/1.1
2024-09-25 05:36:17.788 [main] INFO  c.u.f.r.c.i.LoggingInterceptor [LoggingInterceptor.java:127] Client response: HTTP 200 OK (Bundle/89d1e925-5774-49e3-8124-9fe456b71fd6) in 20ms

So I am not sure if there is actually anything to be fixed here, is there? I was looking at the hapi-fhir-jpaserver-starter code to see if there is a config parameter to change the default SearchPreFetchThresholds such that at least we document this behavior on our side; but from here it seems to me that this parameter is not exposed among the configs; is my understanding correct?

tadgh commented 1 month ago

There's a setter for the storage here, but it appears unused in the jpaserver-starter. It would be straightforward enough to add it to the application.properties file, and then use that value here in the JpaServerStarter.

Would that help if i whipped up a PR for that?

bashir2 commented 1 month ago

Yes adding that config to the jpaserver-starter helps; I am also trying to find other ways we can account for this (maybe from our pipeline's end).

What does the pre-fetching exactly mean? I remember that when doing a search, HAPI was storing the list of IDs for that particular search in DB. Does pre-fetching means creating that full-list? Is it possible to make creation of that list more gradual? I mean something similar to the idea of SearchPreFetchThresholds list but instead of having one last very large chunk, make it gradual, i.e., pre-fetch in batches of a configurable number. So for example, the last number of that list, instead of -1, would be that batch size (e.g., 10000).

If this is not easy to do, I think from our side, we should initially fetch a page that is beyond the last index (e.g., 3000) to make sure everything is pre-fetched, then flood the server with parallel page queries.

tadgh commented 1 month ago

I'd rather defer to @michaelabuckley 's thoughts on this, but I can certainly help in exposing the setting.

tadgh commented 1 month ago

Could you not just use -1 as the first and only element in the list? That would force a full result list immediately, but may blow the memory on the server, depending on your use case.

bashir2 commented 1 month ago

Could you not just use -1 as the first and only element in the list? That would force a full result list immediately, but may blow the memory on the server, depending on your use case.

The problem is that we do not control the source FHIR-server. Our pipeline is supposed to read from any FHIR-server. We have been facing this issue when our pipeline uses the FHIR Search API and the FHIR-server is HAPI. We can recommend that setting to our partners once your PR is released, but we should also be clear about other performance implications of that.

BTW, what are the memory implications of pre-fetching everything, i.e., just use -1 as you suggested?

tadgh commented 1 month ago

On a large enough dataset? Not good! I didn't write this code and am not intimately familiar with it but my read on it is that it would prefetch all possible results, which could exceed memory limitations of the DB or the application server. For your use case, it may be better to fetch a known static amount per fetch, then have your pipeline adjust so that it runs like this:

first query (loads N resources in pre-fetch).
flood other queries up to known pre-fetched count (e.g. N/page-size, if prefetch 2000 and requests are paged by 50 at a time, that's 19 additional request for future pages.
Another individual "prefetch" call.
Another flood.
etc.

Just spitballing here on the options, no clue if this would be suitable for your particular use case.

tadgh commented 1 month ago

Another option would be to somehow make this prefetch configurable on a per-original-request basis, but that obviously opens up the server to DOS vulnerabilities if it's used haphazardly.

hapifhir / hapi-fhir

Requests time out when fetching pages of a search request in parallel ("HAPI-1163: Request timed out") #6109