Open omarshibli opened 3 years ago
Merging #90 (45524fb) into master (012ad80) will decrease coverage by
25.34%
. The diff coverage is3.55%
.
@@ Coverage Diff @@
## master #90 +/- ##
=============================================
- Coverage 89.56% 64.22% -25.35%
Complexity 209 209
=============================================
Files 26 28 +2
Lines 479 668 +189
Branches 66 89 +23
=============================================
Hits 429 429
- Misses 34 223 +189
Partials 16 16
Impacted Files | Coverage Δ | Complexity Δ | |
---|---|---|---|
src/main/java/com/intendia/jmh/GcProfiler.java | 0.00% <0.00%> (ø) |
0.00 <0.00> (?) |
|
...rg/ahocorasick/benchmark/BenchmarkPayloadTrie.java | 0.00% <0.00%> (ø) |
0.00 <0.00> (?) |
|
src/main/java/org/ahocorasick/trie/State.java | 86.48% <66.66%> (ø) |
20.00 <3.00> (ø) |
|
...c/main/java/org/ahocorasick/trie/PayloadState.java | 96.96% <100.00%> (ø) |
21.00 <4.00> (ø) |
|
...rc/main/java/org/ahocorasick/trie/PayloadTrie.java | 95.83% <100.00%> (ø) |
60.00 <0.00> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 012ad80...45524fb. Read the comment docs.
The increase is size may have unforeseen consequences with applications depending on this library, such as: embedded systems, increased download times, and so forth. Fortunately, we can employ a similar strategy to how JDBC works, with the downside that it'll be a fair amount more effort.
For example:
import static CollectionFactory.createHashMap;
public class C {
final Map<String, String> map = createHashMap( ... );
}
The default implementation would use Java's standard libraries. If an implementation exists for a drop-in replacement---fastutil, HPPC, Koloboke, or HFTC---then that library should be used through dynamic detection.
In this way, if developers decide to add the dependency, then it will be used. If they don't import the dependency, then the standard JDK is used and the resulting JAR file won't explode by an extra 20MB. (ProGuard can reduce the number of bundled classes to a bare minimum, but its usage not something we can impose upon developers.)
For cases where there isn't a direct 1:1 correlation (e.g., char
primitive instead of Character
class), then perhaps the strategy pattern or a functional interface may be appropriate to abstract away the implementation detail.
Thanks for the input.
I'm not sure if JDBC-style
is the optimal approach here, the usage pattern of this library is very well defined.
Giving that the direction that I've in mind is to find the optimal implementation and add it the repo.
What other ways can you see to get these performance gains that don't require developers who import this software to get a surprise 20MB increase to their own applications?
Note: A PR that incurs a 10-fold increase to the final JAR size will not be accepted.
I'm totally with you. As I mentioned giving that the usage pattern is well defined, I want to find optimal implementation and copy it into the repo, so we will just have an increase of few classes.
I'm also checking other libraries: https://github.com/thomasmueller/minperf https://github.com/nicholassm/perfect-hashmaps
Description
Suggestion for performance improvement by using FastUtil Char2ObjectMap, specialized char map implementation.
Changes
Observed results
Benchmark result raw files bench-post.2.txt bench-post.csv.txt bench-post.txt bench-pre.2.txt bench-pre.csv.txt bench-pre.txt
Down sides
fastutil dependecy is pretty big, benchmark fat jar is 21M after adding it, prior to that it was about 2.4M
Reference
https://github.com/robert-bor/aho-corasick/issues/31