Closed jamessimone closed 2 weeks ago
@jongpie just pushed up a few other small changes that I noticed while re-reviewing, but otherwise I think this is good to review!
@jamessimone I made it through all of the changes, and don't see anything major to change, just some little things for cleanup, and some questions. If I have some more time this weekend, I'll try to review it again just to be safe, but it's looking really good 🥳
Amazing. Thank you. You caught a lot of great stuff that I am going to clean up today. I'll let you know when I've pushed up my code review changes!
@jongpie okay, I just pushed up another commit which I think addresses all of the code review items!
Attention: Patch coverage is 92.76190%
with 38 lines
in your changes missing coverage. Please review.
Project coverage is 96.29%. Comparing base (
c67a403
) to head (0a7781d
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This is the first (potentially) non-breaking change medium version bump, but in reflecting on the scope of the changes being made, it feels appropriate to make the leap to version 1.7.0. Here's a breakdown of the changes:
Overview
RollupFullBatchRecalculator
is no longer an actual batch class, per se, though it still does batch processing. Instead, it takes advantage of the Apex Cursor beta to iterate through children records when doing full recalculations where the number of records exceeds the value forRollupControl__mdt.MaxLookupRowsBeforeBatching__c
. Like its batch class predecessor, it will continue to default to calculating rollup values for 500 children at a time - still configurable usingRollupControl__mdt.BatchChunkSize__c
. UsingDatabase.Cursor
has several big advantages over using a batch class, and while the feature is still in beta, I expect that it will GA soonDatabase.Stateful
, which is a marker interface that's part of theDatabase.Batchable
framework to calculate what a parent record's prior value was when that parent record had children split between batches. This happens more often than you'd expect, even with very fine-tuned ordering of the children records. With large data volume Salesforce orgs, keeping track of which parents have been updated while iterating through all of the children records represents a serious problem, and at a certain point when relying onDatabase.Stateful
to calculate millions of records and dozens of rollups for a single child type, the rollup calculation will inevitably fail as memory runs out.If you saw the phrase "(potentially) non-breaking change" above and went 🤔,
v1.7.0
adds a DataWeave script to the package. While I view it as unlikely that there are orgs out there running up against that limit, it's worth calling out here. Contact your AE if you truly cannot accommodate additional active DataWeave scripts and want to update. See the section titled "Easily Parsing Data" below for more info on why this change is being madeRollupState__c
records will be orphaned and won't be cleaned up. You will have to delete those records manually. More info is available in the "Easily Deleting Data" section.Challenges
There's one natural conclusion when an in-memory data management strategy is no longer sufficient, and that's a record-based memory management solution. As a package author, however, it would be irresponsible (not to mention untidy) to implement a feature that contributes significantly to the data storage on a subscriber org. This means that the "easy" approach - a data record (henceforth referred to as "state", or "Rollup State", or the API name for the object) for each parent - was immediately disqualified. The solution would have to compress data such that the previously calculated values for a parent record could be:
Easily Retrieving Data
In order to facilitate the easy retrieval of data,
v1.7.0
uses a new custom object,RollupState__c
, which has on it 11 different custom text fields ranging fromRollupState__c.RelatedRecordKeys0__c
all the way throughRollupState__c.RelatedRecordKeys10__c
. Because you can't filter on long text area fields within SOQL, and because the maximum length of a standard text field is 255 characters, creating many fields like this is a crucial piece of the data compression strategy that makes it possible to store state for many parents within a single record. Assuming that most rollups will be keyed to standard Salesforce ids, it follows then that each related record key field can hold(255/(18 + 1) = 13
parent ids per field (the+ 1
accounts for the comma separating each key).Despite using 11 custom fields for this (which might sound like a lot), that ends up allowing for a maximum of
13 x 11 = 143
parent records' data being compressed into a singleRollupState__c
. Now that the kinks have been ironed out with the logic that keeps track of which related record key field is being used at any given time, it will be possible to further expand the number of custom fields being used if there's any concern with the data storage use of Rollup State records for these calculations (which would allow for the data to be even further compressed). That being said, since relevant records are deleted within each cursor "chunk" that's processed, I am hopeful that the data storage concerns being discussed here are purely theoretical.Easily Parsing Data
This proved to be an unexpected hurdle - initial benchmarking on deserializing the long text area on each Rollup State record was poor. As in, for some chunks that had only a dozen state records, the deserialization of those records back into Apex-defined types was causing the 60 second Apex CPU time limit to be exceeded 🤯.
This necessitated the next big change to Apex Rollup - the introduction of a DataWeave script to perform the deserialization. Using DataWeave, the deserialization time was cut from a process that at times was taking 30+ seconds to an operation that has always finished within a single second. That's enough of a justification for me to proceed with this piece of tech, though I do still have plans to circle back on the poor performance of the native deserializer.
Easily Deleting Data
At the end of each recalculation job, any remaining
RollupState__c
records are deleted. This is something to be mindful of. If you were to abort a currently running recalculation job, this would effectively "orphan" those records, and they'll need to be deleted separately (by you, presumably). The framework will not (at this time) do any periodic maintenance on removing olderRollupState__c
records that could be orphaned in this manner, though I am open to the idea of giving users a way to conveniently do so via the Apex Rollup app should the need for that arise.Special Considerations
One of the... downstream effects... of moving away from using Batch Apex was that the majority of the Apex Rollup framework now relies on Queueables to perform async processing. In order to accommodate this,
v1.7.0
takes (greater) advantage of Queueable Finalizers to coordinate work between different threads, as finalizers are the only place where it's possible to fully chain different jobs together without exceeding limits.As I had many failing tests when replacing the Batchable version of
RollupFullBatchRecalculator
with a Queueable, and I've been performing integration testing to sanity check huge swathes of this work, I don't expect that this change should be visible to people in anything other than a possible increase in Debug Logs being generated (as Finalizers create their own, separate, log file). I've mentioned it here, though, on the off-chance that this change does somehow lead to deleterious effects.