networknt / json-schema-validator

A fast Java JSON schema validator that supports draft V4, V6, V7, V2019-09 and V2020-12
Apache License 2.0
836 stars 324 forks source link

Possible performance improvement: JsonNodePath.hashCode() #1082

Closed txshtkckr closed 3 months ago

txshtkckr commented 3 months ago

While profiling the validator locally, I found that a non-trivial amount of time was spent calculating hash codes for JsonNodePath objects. This is comparatively expensive for longer paths because it has to recurse the linked list of parents back to the root, incorporating all of their values at each step. One way to improve this is to adopt the same space/time trade-off that the String class uses, which is to cache non-0 hash values in a non-volatile field to optimistically skip recalculating what should be a stable value. In the String class, we see that pattern done like this:

    private int hash; // Default to 0

...

    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            hash = h = isLatin1() ? StringLatin1.hashCode(value)
                                  : StringUTF16.hashCode(value);
        }
        return h;
    }

This is thread-safe in spite of the variable not being marked volatile because the value is calculated deterministically from immutable state. If we are unlucky and the 0 value is seen due to the unsafe publishing of the calculated value, we just pay the price of recalculating it, a penalty which will typically be seen at most once per thread and worthwhile to skip volatile access. For JsonNodePath, the equivalent code looks like this:

    @Override
    public int hashCode() {
        int h = hash;
        if (h == 0) {
            h = Objects.hash(parent, pathSegment, pathSegmentIndex, type);
            hash = h;
        }
        return h;
    }

Note: Including the parent != null check is possible and probably okay. I didn't do it because it would change the current hash value of a root path, which currently depends on type, to 0 for all root paths regardless of type.

txshtkckr commented 3 months ago
Screenshot 2024-07-03 at 11 09 00

JProfiler "Hot Spots" analysis showing it taking 50% of the CPU time in the test I ran.

txshtkckr commented 3 months ago

Another snapshot taken with that change in place. Hashing cost drops dramatically:

Screenshot 2024-07-03 at 11 24 12
justin-tay commented 3 months ago

Thanks for reporting the issue, analysis and fix.