microsoft / fhir-server

A service that implements the FHIR standard
MIT License
1.14k stars 485 forks source link

All queries failing suddenly #3845

Open Jaco-Minnaar opened 3 weeks ago

Jaco-Minnaar commented 3 weeks ago

Describe the bug We are running an instance of the FHIR Server from the Docker file and running into some issues. Since at least 12 hours ago, all queries are failing with the following exception and trace:

2024-05-08T05:41:26.650255158Z fail: Microsoft.Health.Fhir.CosmosDb.Features.Storage.CosmosFhirDataStore[0]
2024-05-08T05:41:26.650270158Z     CosmosFhirDataStore. 'Search Parameters' failed. Elapsed: 00:00:00.0006074. Memory: 186966472.
2024-05-08T05:41:26.650273858Z     Microsoft.Health.Fhir.Core.Exceptions.InitializationException: Failed to initialize search parameters
2024-05-08T05:41:26.650277358Z         at Microsoft.Health.Fhir.Core.Features.Definition.SearchParameterDefinitionManager.EnsureInitialized() in /repo/src/Microsoft.Health.Fhir.Core/Features/Definition/SearchParameterDefinitionManager.cs:line 106
2024-05-08T05:41:26.650280958Z         at Microsoft.Health.Fhir.Core.Features.Definition.SearchParameterDefinitionManager.GetSearchParameters(String resourceType) in /repo/src/Microsoft.Health.Fhir.Core/Features/Definition/SearchParameterDefinitionManager.cs:line 112
2024-05-08T05:41:26.650284658Z         at Microsoft.Health.Fhir.Core.Features.Definition.SearchableSearchParameterDefinitionManager.GetSearchParameters(String resourceType) in /repo/src/Microsoft.Health.Fhir.Core/Features/Definition/SearchableSearchParameterDefinitionManager.cs:line 49
2024-05-08T05:41:26.650310959Z         at Microsoft.Health.Fhir.Core.Features.Conformance.CapabilityStatementBuilder.SyncSearchParamsAsync(String resourceType) in /repo/src/Microsoft.Health.Fhir.Core/Features/Conformance/CapabilityStatementBuilder.cs:line 185
2024-05-08T05:41:26.650314759Z         at Microsoft.Health.Fhir.Core.Features.Conformance.CapabilityStatementBuilder.SyncSearchParametersAsync() in /repo/src/Microsoft.Health.Fhir.Core/Features/Conformance/CapabilityStatementBuilder.cs:line 348
2024-05-08T05:41:26.650318059Z         at Microsoft.Health.Fhir.CosmosDb.Features.Storage.CosmosFhirDataStore.Build(ICapabilityStatementBuilder builder) in /repo/src/Microsoft.Health.Fhir.CosmosDb/Features/Storage/CosmosFhirDataStore.cs:line 806
2024-05-08T05:41:26.650655764Z fail: Microsoft.Health.Fhir.Core.Features.Conformance.SystemConformanceProvider[0]
2024-05-08T05:41:26.650665064Z     SystemConformanceProvider: Failed running 'Microsoft.Health.Fhir.CosmosDb.Features.Storage.CosmosFhirDataStore' when building a new CapabilityStatement.
2024-05-08T05:41:26.650668964Z     Microsoft.Health.Fhir.Core.Exceptions.InitializationException: Failed to initialize search parameters
2024-05-08T05:41:26.650672264Z         at Microsoft.Health.Fhir.Core.Features.Definition.SearchParameterDefinitionManager.EnsureInitialized() in /repo/src/Microsoft.Health.Fhir.Core/Features/Definition/SearchParameterDefinitionManager.cs:line 106
2024-05-08T05:41:26.650675665Z         at Microsoft.Health.Fhir.Core.Features.Definition.SearchParameterDefinitionManager.GetSearchParameters(String resourceType) in /repo/src/Microsoft.Health.Fhir.Core/Features/Definition/SearchParameterDefinitionManager.cs:line 112
2024-05-08T05:41:26.650678965Z         at Microsoft.Health.Fhir.Core.Features.Definition.SearchableSearchParameterDefinitionManager.GetSearchParameters(String resourceType) in /repo/src/Microsoft.Health.Fhir.Core/Features/Definition/SearchableSearchParameterDefinitionManager.cs:line 49
2024-05-08T05:41:26.650682665Z         at Microsoft.Health.Fhir.Core.Features.Conformance.CapabilityStatementBuilder.SyncSearchParamsAsync(String resourceType) in /repo/src/Microsoft.Health.Fhir.Core/Features/Conformance/CapabilityStatementBuilder.cs:line 185
2024-05-08T05:41:26.650686065Z         at Microsoft.Health.Fhir.Core.Features.Conformance.CapabilityStatementBuilder.SyncSearchParametersAsync() in /repo/src/Microsoft.Health.Fhir.Core/Features/Conformance/CapabilityStatementBuilder.cs:line 348
2024-05-08T05:41:26.650689565Z         at Microsoft.Health.Fhir.CosmosDb.Features.Storage.CosmosFhirDataStore.Build(ICapabilityStatementBuilder builder) in /repo/src/Microsoft.Health.Fhir.CosmosDb/Features/Storage/CosmosFhirDataStore.cs:line 806
2024-05-08T05:41:26.650692965Z         at Microsoft.Health.Fhir.Core.Features.Conformance.SystemConformanceProvider.GetCapabilityStatementOnStartup(CancellationToken cancellationToken) in /repo/src/Microsoft.Health.Fhir.Core/Features/Conformance/SystemConformanceProvider.cs:line 146

The response content is the following:

{
    "resourceType": "OperationOutcome",
    "id": "ee3a3476-4e9c-4f28-a960-699f23b1cab6",
    "meta": {
        "lastUpdated": "2024-05-08T05:51:35.2237241+00:00"
    },
    "issue": [
        {
            "severity": "error",
            "code": "unknown",
            "diagnostics": ""
        }
    ]
}

We are using CosmosDB and not sure if the issue lies with the FHIR server instance or the connection with the CosmosDB instance.

We have 3 environments with the same setup and all of them have the same issue.

FHIR Version? R4

Data provider? CosmosDB

To Reproduce Steps to reproduce the behavior:

  1. Start FHIR server (successful)
  2. Do any query.
  3. See query failing.

Expected behavior Successful queries.

Actual behavior No access to any data on FHIR Server

brendankowitz commented 3 weeks ago

If you're using containers, you can try going into the webapp/deployment center in portal and pick an earlier tag: https://github.com/microsoft/fhir-server/releases If it is because of the last release then try rolling back to image tag 4.0.116

Jaco-Minnaar commented 3 weeks ago

If you're using containers, you can try going into the webapp/deployment center in portal and pick an earlier tag: https://github.com/microsoft/fhir-server/releases If it is because of the last release then try rolling back to image tag 4.0.116

We have tried rolling back to the previous version and seem to be having a working system now. Thanks!

Jaco-Minnaar commented 3 weeks ago

What could have caused this issue?

LTA-Thinking commented 3 weeks ago

In the last release we added a check on startup to make sure search parameters were fully loaded before allowing requests to the service. The error you are seeing implies the search parameters were never loaded. Are there any other errors farther up in the logs? I'm curious if any exceptions were logged during startup.

Jaco-Minnaar commented 3 weeks ago

The only other exception I see is this, but it looks like it's the same exception, just further down in the stack trace.

2024-05-09T07:48:56.667551201Z fail: Microsoft.Health.Fhir.Api.Features.Filters.OperationOutcomeExceptionFilterAttribute[0]
2024-05-09T07:48:56.667576402Z       InternalServerError error returned
2024-05-09T07:48:56.667584002Z       Microsoft.Health.Fhir.Core.Exceptions.InitializationException: Failed to initialize search parameters
2024-05-09T07:48:56.667587402Z          at Microsoft.Health.Fhir.Core.Features.Definition.SearchParameterDefinitionManager.EnsureInitialized() in /repo/src/Microsoft.Health.Fhir.Core/Features/Definition/SearchParameterDefinitionManager.cs:line 106
2024-05-09T07:48:56.667591302Z          at Microsoft.Health.Fhir.Core.Features.Definition.SearchParameterDefinitionManager.GetSearchParameters(String resourceType) in /repo/src/Microsoft.Health.Fhir.Core/Features/Definition/SearchParameterDefinitionManager.cs:line 112
2024-05-09T07:48:56.667595102Z          at Microsoft.Health.Fhir.Core.Features.Definition.SearchableSearchParameterDefinitionManager.GetSearchParameters(String resourceType) in /repo/src/Microsoft.Health.Fhir.Core/Features/Definition/SearchableSearchParameterDefinitionManager.cs:line 49
2024-05-09T07:48:56.667599203Z          at Microsoft.Health.Fhir.Core.Features.Conformance.CapabilityStatementBuilder.SyncSearchParamsAsync(String resourceType) in /repo/src/Microsoft.Health.Fhir.Core/Features/Conformance/CapabilityStatementBuilder.cs:line 185
2024-05-09T07:48:56.667602703Z          at Microsoft.Health.Fhir.Core.Features.Conformance.CapabilityStatementBuilder.SyncSearchParametersAsync() in /repo/src/Microsoft.Health.Fhir.Core/Features/Conformance/CapabilityStatementBuilder.cs:line 348
2024-05-09T07:48:56.667606103Z          at Microsoft.Health.Fhir.CosmosDb.Features.Storage.CosmosFhirDataStore.Build(ICapabilityStatementBuilder builder) in /repo/src/Microsoft.Health.Fhir.CosmosDb/Features/Storage/CosmosFhirDataStore.cs:line 806
2024-05-09T07:48:56.667609403Z          at Microsoft.Health.Fhir.Core.Features.Conformance.SystemConformanceProvider.GetCapabilityStatementOnStartup(CancellationToken cancellationToken) in /repo/src/Microsoft.Health.Fhir.Core/Features/Conformance/SystemConformanceProvider.cs:line 146
2024-05-09T07:48:56.667613603Z          at Microsoft.Health.Fhir.Core.Features.Conformance.SystemConformanceProvider.GetMetadata(CancellationToken cancellationToken) in /repo/src/Microsoft.Health.Fhir.Core/Features/Conformance/SystemConformanceProvider.cs:line 349
2024-05-09T07:48:56.667629203Z          at Microsoft.Health.Fhir.Core.Features.Conformance.GetCapabilitiesHandler.Handle(GetCapabilitiesRequest request, CancellationToken cancellationToken) in /repo/src/Microsoft.Health.Fhir.Core/Features/Conformance/GetCapabilitiesHandler.cs:line 29
2024-05-09T07:48:56.667633204Z          at MediatR.Pipeline.RequestExceptionProcessorBehavior`2.Handle(TRequest request, RequestHandlerDelegate`1 next, CancellationToken cancellationToken)
2024-05-09T07:48:56.667636704Z          at MediatR.Pipeline.RequestExceptionProcessorBehavior`2.Handle(TRequest request, RequestHandlerDelegate`1 next, CancellationToken cancellationToken)
2024-05-09T07:48:56.667640104Z          at MediatR.Pipeline.RequestExceptionActionProcessorBehavior`2.Handle(TRequest request, RequestHandlerDelegate`1 next, CancellationToken cancellationToken)
2024-05-09T07:48:56.667643404Z          at MediatR.Pipeline.RequestExceptionActionProcessorBehavior`2.Handle(TRequest request, RequestHandlerDelegate`1 next, CancellationToken cancellationToken)
2024-05-09T07:48:56.667646704Z          at MediatR.Pipeline.RequestPreProcessorBehavior`2.Handle(TRequest request, RequestHandlerDelegate`1 next, CancellationToken cancellationToken)
2024-05-09T07:48:56.667649804Z          at Microsoft.Health.Fhir.Core.Extensions.FhirMediatorExtensions.GetCapabilitiesAsync(IMediator mediator, CancellationToken cancellationToken) in /repo/src/Microsoft.Health.Fhir.Shared.Core/Extensions/FhirMediatorExtensions.cs:line 157
2024-05-09T07:48:56.667653204Z          at Microsoft.Health.Fhir.Api.Controllers.FhirController.Metadata() in /repo/src/Microsoft.Health.Fhir.Shared.Api/Controllers/FhirController.cs:line 636
2024-05-09T07:48:56.667656404Z          at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.TaskOfIActionResultExecutor.Execute(ActionContext actionContext, IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
2024-05-09T07:48:56.667659704Z          at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeActionMethodAsync>g__Awaited|12_0(ControllerActionInvoker invoker, ValueTask`1 actionResultValueTask)
2024-05-09T07:48:56.667663804Z          at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeNextActionFilterAsync>g__Awaited|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
LTA-Thinking commented 3 weeks ago

Thanks @Jaco-Minnaar The root cause of this is likely a search parameter in a bad state. I've identified a bug that is preventing a more useful exception from being surfaced, but I'm afraid I can't repro the failure you are seeing without knowing the state of the search parameter. I'm working on a change that will let the better error message surface, which may help us find the root of the issue.

LTA-Thinking commented 2 weeks ago

@Jaco-Minnaar This issue should be resolved with #3849 Could you try release 4.0.139 and see if the crash is still present? We have also added in more error logging, so you may see a more helpful error on startup pointing to what the root of the issue is. This log shouldn't impact your ability to use your service.