Closed SamMayWork closed 9 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
fd542c0
) 99.99% compared to head (45fc854
) 99.99%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
WRT O(n)
performance:
To my knowledge the only way around these problems would be:
Spotted a problem during testing that might be a bit tricky to solve. Given there are likely to be X number of events on the chain where X is a high number, being restricted to skipping 1000 records (as is the default API behaviour) is super restrictive. There's an entirely valid use-case for skip 90,000 records and then return me 200 records matching my filter but this would not work as of right now.
I've pushed changes in this commit https://github.com/hyperledger/firefly/pull/1452/commits/52c191e9981473dfa73248386a1e033fd23d6399 which allows for a custom (and :sparkles: configurable :sparkles:) number of events to be skipped, buuuuuut this doesn't play nicely with the swagger generation, so the generated API documentation isn't updated to reflect the new value. This means there's a disconnect right now between what our backend supports and what we say it supports in the swagger.
Needs more thinking for a workaround...
EDIT: Commit https://github.com/hyperledger/firefly/pull/1452/commits/80ddde2d4e0528cda9c4566eba72f6723cdb5dd5 has a nifty but imperfect workaround...
Nifty: After the swagger is generated by the core libraries, we grab the params and overwrite the description on the skip field with our own description with the actually correct value for the maximum supported sip
Imperfect: If another API has the same issue at this API, they'll need to duplicate the code I've added to change the values on their API. What would be great would be if the values for skip/limit were configurable per route and then that got read in through to the Swagger get, but this is likely a larger item of work.
In discussion w/ @nguyer, we've landed on some further changes for this PR given the fact that we're concerned about the usefulness of this API if we don't solve the N+1 events problem. We've agreed on the following changes:
This means that assuming that you hit the limit of matching events before you hit the end of the sequence range, you can get the next event in the sequence by calling the API again and providing the starting sequence ID to be the sequence ID of the last record. The hope is that the DB should be way faster indexing by sequence ID so performance should be come significantly less of an issue.
These changes semantically change how we're using some of the values so in the new world this is what we have:
startsequence
: Start ID to index fromendsequence
: End ID to index fromskip
: Unused (we'd expect changes to be made using startsequence
)limit
Limit for total count of FILTERED recordsEDIT: Will also need to make the new configuration options make sense with the changes listed above, probably going to remove both the new options and have a configurable limit on the difference between the start and end sequence IDs.
Changes on the way :rocket:
Comment above about the SQL performance of this query syntax, but I think we're in a good place for a re-review now :rocket:
Codecov Report
Attention:
10 lines
in your changes are missing coverage. Please review.Comparison is base (
fd542c0
) 99.99% compared to head (a08f80b
) 99.94%.Files Patch % Lines internal/events/event_manager.go 50.00% 10 Missing ⚠️ Additional details and impacted files ☔ View full report in Codecov by Sentry. 📢 Have feedback on the report? Share it here.
Looks like codecov is just having trouble talking to GitHub. I confirmed by checking out this branch myself that it does not lower test coverage.
I get how to use it once we have a sequence number as a reference and we can go higher or lower from there. I think most of the time users are going to be interested in the most recent n events though and don't know where to start from.
Some initial thoughts on this here:
I think option 2 here is just bad since we're going to need to make another query to the DB to find out the current highest index and then tag this onto a request that the user might not want to make. I.e. If I was a user, and wanted the most recent event I'd need to make a request I don't care about first to get the index so I can make a request I care about - that doesn't make sense.
Option 1 is the way forward, and it's consistent with the /events
API which this API is more or less the same as. With that in mind, I think this becomes the behaviour of the API with respect to defaults for start/end sequence IDs.
Start Sequence ID provided (SS) | End Sequence ID provided (ES) | Calc(startsequence) | Calc(endsequence) |
---|---|---|---|
Y | Y | SS | ES |
N | Y | ES-1000 (*1) | ES |
Y | N | SS | SS+1000 (*1) |
N | N | Most recent record sequence ID - 1000 | Most recent record sequence ID |
So one little wrinkle though is making sure that the Swagger reflects the behaviour and that the user is informed that if they don't provide a value, they'll get the most recent events. The /events
API doesn't have this problem since start/end sequence aren't explicitly defined.
(*1) - Default max range is 1000 records, through configuration this value could be changed
EDIT: I think this should be addressed now in https://github.com/hyperledger/firefly/pull/1452/commits/0395957853f0dd939c797a22042d542c6de99cba
Right, so it looks like there's a flaky e2e at the moment which I've seen pass and fail on different commits on this branch, I'll dig into it here for a little bit, but I'd be surprised if the changes I have made would be causing this test to fail given the test is failing during multi-party set up, initial look suggests that one of the nodes is failing to register its org.
EDIT: Spent some time looking at this, and it looks like the tests are a bit flaky on main too, I think for now we should re-run the failing E2E on this PR, and it should work. Probably worth a separate item to look into the E2E failures.
Note: Could not get the E2E's running on my M1 mac, looking in the Dockerfile there's a lot of AMD64 stuff, could spend some cycles looking into this if required but not sure it's worth the time given we know this is a flaky test.
ref: https://github.com/hyperledger/firefly/issues/1443
Adds a new fancy API:
subscriptions/{subid}/events
My current concern is that this query is an expensive operation because by necessity, it needs to be O(n) as we have to go through all of the events to find those that match the subscription type. Additionally, skip provided by the user does not help us here as we don't know how many of the subscribed events actually match our filter. If the provided filter does not match any events, we will iterate through all of the events in the DB.
Local benchmark for processing ~93,000 events in FireFly was ~30 seconds on this API with a subscription that matches no events (I.e. going through all of the events and running checks on them). An internal limit of <100 takes roughly 30-45 seconds to complete, higher figures tend more towards ~30 seconds. Increasing the internal limit past 200 does not substantially decrease the time required for the query.