hashicorp-forge / grove

A Software as a Service (SaaS) log collection framework.
https://hashicorp-forge.github.io/grove/
Mozilla Public License 2.0
131 stars 13 forks source link

Grove missing 1Password Events Due to Eventual Consistency #64

Open brian-avery-angi opened 3 weeks ago

brian-avery-angi commented 3 weeks ago

First, thanks for implementing Grove, it's an awesome tool.

The 1Password connectors (item usage, audit, signin attempts) all use the item timestamp as the POINTER_PATH in Grove. This catches a lot of events, but in our testing, we're finding that it's missing some item usage events.

1Password operates on an eventual consistency model and suggests that you use a reset cursor including the timestamp for the initial request and then the cursor past the initial request instead to avoid missing data (https://developer.1password.com/docs/events-api/reference/#cursor). Looking at the Grove code, the cursor is used within the same request, but when Grove exits and restarts, it uses the timestamp and only requests events past the date, missing events that took longer to come in.

I've been reviewing the code and am seeing three issues so far. :

hcpadkins commented 2 weeks ago

Hey there,

Thank you very much for the detailed report, and suggestions to resolve this issue! We'll do some investigation into this further, and I'll get back to you around a fix soon.

brian-avery-angi commented 2 weeks ago

Thanks! I'm happy to submit a fix (and I've been working on one), just want to know if there's an advised way to address those two concerns.

hcpadkins commented 2 weeks ago

Ah that's even better, thank you for the help!

In terms of the first approach, that shouldn't be an issue. We intentionally didn't want to assume that pointers were time-stamps, as we thought that we'd eventually encounter an API which uses something other than time as a marker. The only side effect would be a migration path / steps required for any users currently using the connector, as any stored pointers in the cache would no longer be valid.

In terms of the second, depending on the structure, usually we attempt to decouple pagination cursors from pointers - as they usually serve two distinct but related purposes.

Previously, we have handles this by using a Grove internal type called AuditLogEntries to return a page of results from an API operation performed by a client, as well as the cursor for the next page. This allows the list of results to be passed off to save, and the cursor - if set - used to perform subsequent requests page by page. It also helps to keep the specific API related concerns out of the connector event loop, and in the API client.

As an example, this is how we're performing this operation in the Github connector.

Please let me know if you have any questions, or concerns, and thank you again for this issue and for the help with the fix!

brian-avery-angi commented 2 weeks ago

Thanks and awesome! I'll see if I can recognize the original value as a timestamp and we can seed on that.

Looking at 1Password (https://developer.1password.com/docs/events-api/reference/#post-apiv1itemusages), I see:

{
    "cursor": "aGVsbG8hIGlzIGl0IG1lIHlvdSBhcmUgbG9va2luZyBmb3IK",
    "has_more": true
    "items": [
        {
            "uuid": "56YE2TYN2VFYRLNSHKPW5NVT5E",
            "timestamp": "2023-03-15T16:33:50-03:00",
            "used_version": 0,
            "vault_uuid": "VZSYVT2LGHTBWBQGUJAIZVRABM",
            "item_uuid": "SDGD3I4AJYO6RMHRK8DYVNFIDZ",
            "user": {
                "uuid": "4HCGRGYCTRQFBMGVEGTABYDU2V",
                "name": "Wendy Appleseed",
                "email": "wendy_appleseed@agilebits.com"
            },
            "client": {
                "app_name": "1Password Browser",
                "app_version": "20240",
                "platform_name": "Chrome",
                "platform_version": "string",
                "os_name": "MacOSX",
                "os_version": "13.2",
                "ip_address": "192.0.2.254"
            },
            "location": {
                "country": "Canada",
                "region": "Ontario",
                "city": "Toronto",
                "latitude": 43.5991,
                "longitude": -79.4988
            },
            "action": "secure-copy"
        }
    ]
}

I think the save function gets the pointer to save here (https://github.com/hashicorp-forge/grove/blob/471a3c64fd4970062fa6e83c8a35b3846fbf003d/grove/connectors/__init__.py#L375). This looks at a JMEPath newest = jmespath.search(self.POINTER_PATH, entries[-1]) and the value for entries is returned from 1Password here -- https://github.com/hashicorp-forge/grove/blob/471a3c64fd4970062fa6e83c8a35b3846fbf003d/grove/connectors/onepassword/api.py#L119

So the question I was trying to ask is what's the best way to get the cursor (outside of the items array) accessible from inside of the items array when the save function tries to access it. My current approach is to copy it into each item.

hcpadkins commented 2 weeks ago

Ah sorry for the misunderstanding! I've just managed to grok what the API is saying this morning, and I think I see what you mean.

As I understand it, when using a reset cursor a datestamp is only ever provided on the first interaction with the API - in the Reset Cursor request. After this request, all subsequent collections would use the cursor returned on the root of the response from the last page of the last collection, which will not be present in the log records submitted to save as its outside of the element currently passed.

I'm having a look through the code now to see if there's any escape hatches we can use rather than having to copy the cursor into the data returned by the API. I'll get back to you shortly!

brian-avery-angi commented 2 weeks ago

Awesome. Thank you! Yeah, that's what I was seeing with this issue. Currently this is dependent on time for all requests whereas it should only be dependent on time for the first interaction.

hcpadkins commented 2 weeks ago

I've been thinking on this one, and unfortunately, for now copying the cursor to each log record entry is likely the most appropriate way forward to unblock this implementation.

A longer term fix would likely be to introduce an additional and optional DATA_PATH to be used in conjunction with POINTER_PATH. This would allow handling these rare, but rather annoying, edge cases where the pagination cursor is also used as a pointer.

However, implementing is likely to require quite a few changes to existing connectors and clients, as well as a potentially breaking API change.

One outstanding question is how long the cursors are valid for in the 1Password API. It seems unlikely that they would be valid forever, but I'm unable to find any documentation around whether they are only valid for N minutes, hours, or otherwise. It might require some experimentation, but in the mean time, I'll see about sending a message across to our account contact at 1Password to confirm the expected behavior.

I'll think on whether there is a nicer way to achieve this functionality within the core without such a wide change, but for now, copying the pointer into each log record is likely the best way to resolve this issue and get this shipped.

Thank you again for your help, and your patience while we've been digging further into this one.