Closed ben-manes closed 2 years ago
JsonContext.pathFromCache(path)
was observed by an Apache Pinot user: see here https://github.com/apache/pinot/issues/7403 - it's beneficial to use the CacheProvider
SPI to replace the implementation. My PR here addresses some of the other issues (but I didn't bother to replace the LinkedList
wrapping): https://github.com/json-path/JsonPath/pull/750
Good point @richardstartin, I did that from the get go using Caffeine. Unfortunately the inefficient api above the cache is problematic. Thanks for the PR, I hope we see some of these fixes merged in.
Something else I found in this vein is that json paths which don't match the document throw PathNotFoundException
which really adds up. If PathNotFoundException
overrode fillInStackTrace
to return this
the performance consequences could be remedied easily, but with a little more effort it would be better to return whether the path matched the document or not and return false when it doesn't.
Yes, I think you are finding the same problem as my 4th item. The exception for control flow is pretty bad, as you show.
I think these issues are mostly addressed in the latest release.
Thanks @richardstartin. I'll close this then. 🙂
hi @ben-manes , I see the code of master branch still use the LRUCache
which is somehow inefficient, do we need to replace the default implementation for it by using guava / caffeine cache.
@Aitozi I replace it in my usage, so I would think others would want to as well. LRUCache
locks on read so it is not highly concurrent. A simple wrapper like below is all that is needed.
Cache<String, JsonPath> cache = Caffeine...build();
CacheProvider.setCache(new com.jayway.jsonpath.spi.cache.Cache() {
@Override public JsonPath get(String key) {
return cache.getIfPresent(key);
}
@Override public void put(String key, JsonPath value) {
cache.put(key, value);
}
});
@ben-manes Thanks for your quick response. I try the same way as you mentioned, and it works. I also run a benchmark for it, I found it will have about 10 times performance delta.
Benchmark Mode Cnt Score Error Units
GuavaCacheBenchmark.get thrpt 30 4480.563 ± 203.311 ops/ms
GuavaCacheBenchmark.put thrpt 30 1774.769 ± 119.198 ops/ms
LRUCacheBenchmark.get thrpt 30 441.239 ± 2.812 ops/ms
LRUCacheBenchmark.put thrpt 30 350.549 ± 12.285 ops/ms
And I think as a common library it will be nice it can be improved internally (as this is a critical code path)
You'll of course find Caffeine is faster than Guava, too =)
If the project wants a concurrent LRU without dependencies, then there are two quick and good options.
LRUCache
.
I am observing high cpu usage due to extensive usage of JsonPath. Using Java Flight Recording to profile a production machine, these areas standout in JProfiler as cpu hotspots.
JsonPath.getPath() This method recursively evaluates the
PathToken
to build the string representation. In one case we have this in a TreeSet to sort the order of evaluations, and the comparator is surprisingly expensive. Since the JsonPath is compiled and cached, it would make sense for the string representation to be lazily cached (ala String's hashCode). It may also be faster to replace recursive descent with a loop.JsonContext.pathFromCache(path) This method very inefficiently computes the cache key, resulting in excessive cpu and allocations.
filters
, then the path can be used directly as the key.concat
slow path is used.LinkedList
, as they have the sametoString
implementation.Arrays.toString
would be a more efficient conversion.computeIfAbsent
to avoid redundant work on a cache miss due to races.JsonPath.compile(path) This user-facing method creates a new
JsonPath
instead of using the cache. Therefore if one tries to optimize to bypass the slowpathFromCache
call, there is high overhead. The compilation should return the cached copy, allowing the application user to benefit from the configured cache.JsonPath.read(Object, Configuration) This method uses exceptions as ordinary control flow. That is a bad practice because it is slow when many exceptions are created, as filling the stack trace is a costly operation. The internals should return a result object which could be evaluated to the suppressed value.