permitio / opal

Policy and data administration, distribution, and real-time updates on top of Policy Agents (OPA, Cedar, ...)
https://opal.ac
Apache License 2.0
5.14k stars 180 forks source link

Inline updates are out-of-order #697

Open tyndra opened 2 weeks ago

tyndra commented 2 weeks ago

OPAL Version: 0.7.12

Observing some inline data updates being lost. Specifically, the last expected data update is not persisted even though I can see it is being processed in the logs. I believe it happens when the client also doing a periodic update (for a different topic) that takes a significant amount of time (http call takes several seconds) and returns about 5MB of data.

The logs below show the sequence of events as they are logged.

data_xyz is a topic configured for periodic updates but also receives inline updates. data_abc is a topic that is updated inline only and does not persist in the last value.

Timestamp Event Additional Column
08:15:48.995 Saving fetched data to policy-store: source url='https://my_url for periodic update for data_xyz', destination path='/xyz' Periodic update finished
08:15:48.764 Data provided inline for url: data_abc
08:15:48.764 Fetching policy data
08:15:48.763 Updating policy data, reason: None
08:15:48.763 Triggering data update with id: 3a4462b7-2673-47ef-909e-4b4171e67fc0 The last update for data_abc topic that is being lost
08:15:48.763 Received notification of event: data_xyz
08:15:48.208 Triggering data update with id: 5b93367d-ef2f-48a9-a618-3730309a4293
08:15:48.208 Fetching policy data
08:15:48.208 Data provided inline for url: data_abc
08:15:48.207 Updating policy data, reason: None
08:15:48.207 Received notification of event: data_xyz
08:15:47.994 Data provided inline for url: data_abc
08:15:47.993 Updating policy data, reason: None
08:15:47.993 Triggering data update with id: ba222464-cd14-4b24-8177-79d62909e843
08:15:47.993 Fetching policy data
08:15:47.992 Received notification of event: data_xyz
08:15:47.767 Triggering data update with id: 46538233-c37f-4cd4-a701-36ba05787bac
08:15:47.767 Fetching policy data
08:15:47.767 Data provided inline for url: data_abc
08:15:47.766 Received notification of event: data_xyz
08:15:47.766 Updating policy data, reason: None
08:15:47.544 Fetching policy data
08:15:47.544 Triggering data update with id: af9e98f8-ca65-426c-8c7d-937a8f901a96
08:15:47.544 Data provided inline for url: data_abc
08:15:47.544 Updating policy data, reason: None
08:15:47.543 Received notification of event: data_xyz
08:15:47.326 Data provided inline for url: data_abc
08:15:47.326 Fetching policy data
08:15:47.325 Triggering data update with id: 67417cea-27f9-4a7a-b2cf-2e68566ce005
08:15:47.325 Updating policy data, reason: None
08:15:47.324 Received notification of event: data_xyz
08:15:47.108 Data provided inline for url: data_abc
08:15:47.107 Triggering data update with id: c25bfbcf-8426-40da-9b7c-fdc15878cdf0
08:15:47.107 Updating policy data, reason: None
08:15:47.107 Fetching policy data
08:15:47.106 Received notification of event: data_xyz
08:15:46.884 Data provided inline for url: data_abc
08:15:46.884 Fetching policy data
08:15:46.883 Triggering data update with id: 663e79ba-9ec0-4d02-97af-c8f919fa81ff
08:15:46.883 Received notification of event: data_xyz
08:15:46.883 Updating policy data, reason: None
08:15:46.661 Triggering data update with id: 3ae29e94-458f-467d-9897-893a20e268c9
08:15:46.661 Data provided inline for url: data_abc
08:15:46.661 Fetching policy data
08:15:46.660 Received notification of event: data_xyz
08:15:46.660 Updating policy data, reason: None
08:15:46.657 Fetching data from url: https://my_url for periodic update for data_xyz Start of periodic load for "data_xyz" topic
08:15:46.656 Triggering data update with id: c2a9d59b5c374eea80cc2cdedabd633c
08:15:46.656 Fetching policy data
danyi1212 commented 2 weeks ago

Hey @tyndra, thank you for reporting that issue 🌟

Data updates in OPAL are not guaranteed to be applied in a specific order on all OPAL Clients. The data update itself is a push triggering the OPAL Client to fetch the most up-to-date data from the Data Source at the time.

This way, the data source itself is the one guaranteeing the integrity of the data, as any time the OPAL Client access it, it is the most correct data, regardless of the order.

To further understand and investigate the case you've explained, I'll need to know your setup better. Can you please provide:

At best will be to create a minimally reproduceable example of the issue, docker-compose for the OPAL setup and a script to mimic the behavior.

Waiting to your info, thanks! 💎

tyndra commented 2 weeks ago

OPAL Deployment:

Data sources:

The errors are intermittent and are best reproduced with an automated test case that sends inline data updates for data source B:

{
  "id": "....",
  "entries": [
    {
      "url": "some text",
      "topics": [ "data_abc" ],
      "dst_path": "/test_path",
      "save_method": "PATCH",
      "data": [
        {
          "op": "repalce",
          "path": "myobj",
          "value": {
            "a": "101...",
            "b": "201..."
          }
        }
      ]
    }
  ]
}

GET: [Client Instance IP-Address]/data/test_path/myobj

When inline updates for B overlap with the periodic updates for the data source A the instance does not persist the latest inline data in OPA’s memory even though the logs(above) show that OPAL-Client received the last data update request.