Update caching so that minimum work is repeated to improve performance of the tool.
Now adds all relevant information as a dictionary. This caching update is now 1:1 with Legacy in terms of performance with the same ruleset.
Started by just caching the int[], then added caching for the String8 and byte[] buffer as well after noticing that there was non-negligible time spent in the String8 conversion function.
From there, tests started failing because some files output raw and base64 decoded text to be scanned, and the cached String8/byte[] weren't being updated accordingly since it was different input text, but from the same file. So updated the caching to group input text (whether it was raw or base64, or anything else) to an String8 and byte[] tuple.
From there, noticed String8 to string comparison failures and lots of time spent in comparing new text coming in to see if it is already cached. Text coming in was basic string, key in dictionary was String8, so comparing the two took very long. Swapped to using basic string as dictionary key to speed up string comparison, and began storing String8, int[], and byte[] together as a tuple.
Changes
This PR was created off the same branch as https://github.com/microsoft/sarif-pattern-matcher/pull/665
Update caching so that minimum work is repeated to improve performance of the tool.
Now adds all relevant information as a dictionary. This caching update is now 1:1 with Legacy in terms of performance with the same ruleset.
Started by just caching the int[], then added caching for the String8 and byte[] buffer as well after noticing that there was non-negligible time spent in the String8 conversion function.
From there, tests started failing because some files output raw and base64 decoded text to be scanned, and the cached String8/byte[] weren't being updated accordingly since it was different input text, but from the same file. So updated the caching to group input text (whether it was raw or base64, or anything else) to an String8 and byte[] tuple.
From there, noticed String8 to string comparison failures and lots of time spent in comparing new text coming in to see if it is already cached. Text coming in was basic string, key in dictionary was String8, so comparing the two took very long. Swapped to using basic string as dictionary key to speed up string comparison, and began storing String8, int[], and byte[] together as a tuple.