Closed sesuncedu closed 9 years ago
Switching code in MapPointer constructor from
map = LinkedHashMultimap.create();
to
map = MultimapBuilder.hashKeys().hashSetValues().build();
Reduces internals retained size from 1,778,394,616 bytes to 1,597,578,096 bytes.
With MapPointer constructor
map = MultimapBuilder.hashKeys().linkedHashSetValues().build();
internals retained size is 1,753,989,672 bytes
With MapPointer constructor
map = MultimapBuilder.hashKeys().arrayListValues().build();
internals retained size is 0,888,169,544 bytes
Summary in table form:
Keys | Values | Size | Scaled Size |
---|---|---|---|
Linked -> | <- Linked Set | 1,778,394,616 bytes | 1.000 |
Unlinked | Linked HashSet | 1,753,989,672 bytes | 0.986 |
Unlinked | Unlinked HashSet | 1,597,578,096 bytes | 0.898 |
Unlinked | Array List | 0,888,169,544 bytes | 0.499 |
Nice work Simon. Array List looks great at first sight. I'm wondering how all of this plays out with write performance. Obviously, array list will be slower for writes, but I'm wondering if things perform o.k. in practice. Perhaps some kind of config option that would hint at optimising memory vs read vs write performance would be good.
The array list approach does have the disadvantage that it doesn't remove duplicates until ten contains checks have been made (for a given fetch of values).
public synchronized Set<V> getValues(K key) {
init();
return CollectionFactory.getCopyOnRequestSetFromMutableCollection(map
.get(key));
}
After the threshold has been exceeded, the collection gets converted to a LinkedHashSet, if it wasn't a set already.
This conversion takes place in the ConditionalCopySet , so it doesn't affect the value in the map. Any modifications to the returned set also trigger the COW.
Performance tradeoffs (time).
ArrayList : Fast add; Fast iteration; Containment check O(n) with small k. HashSet : Slower add, Slow iteration ; Containment check O(1) with medium k. LinkedHashSet : Slower add, Fast iteration, Containment check O(1) with medium k.
For maps whose entries will only contain a few values, ArrayList may perform ok for contains checks.
I'm looking to see if crafting a Trove TObjectHashSet (basically making a TObjectHashMap without any values) and seeing it it gives any useful wins .
Using lists also keeps the order of insertion, although this might not be an important requirement here.
An alternative restructuring that came to mind when Simon mentioned int based indexing is to reuse FaCT++ tricks with a master list of axioms and all indexes being sets of ints, which point at the main list records. At that point, we could use memory efficient collections of ints to hold the lot.
(or even arrays of ints, for that matter. For low variance collections, that would work quite well)
That is part of the master plan :-)
Incidentally JFact could benefit from trove TIntInt collections (was comparing them that the other day)
Also, there are some great ways of encoding sets of ints using sparse bitmaps
I tried the bitmap encoding a while ago but could not manage any effective sparse representation. Maybe time to give it another go.
I think there may be ways to improve density for some of ontologies with some usage patterns (could I be any modaler?).
The mapping from entity to int has a major effect.
See the analysis in
Daniel Lemire, Owen Kaser, Kamel Aouiche, Sorting improves word-aligned bitmap indexes. Data & >Knowledge Engineering 69 (1), pages 3-28, 2010. http://arxiv.org/abs/0901.3751
Implementation is here: https://github.com/lemire/javaewah
I believe that using a-Priori style frequent-item-set mining on sampled data to find groupings could give very good results- especially for a-boxes where there there are patterns of property usage / or values that are implicit in the data but not expressed in the ontology. That approach definitely wins for converting sets of triples into relational database tables.
I think there may be ways to improve density for some of ontologies with some usage patterns (could I be any modaler?).
On some virtual machines, for some users.
Sampling existing ontologies for patterns reminds me of a couple of PhD theses or more I've read in recent years, including Matthew's. Atomic decomposition might have something in common with this, and the notion of locality for modules might have to do with the notion of locality in memory for the ontology representation. Not an immediate winner in terms of performance, yet still I wonder.
Implementation is here: https://github.com/lemire/javaewah
Looks quite good.
One idea that might work would be to have an Array backed implementation that remains as an unsorted list until a certain size threshold is reached, then switches over to a open-keyed hash table.
I have a strong suspicion that most entries in smaller tables will be unique, and that hash code inequality will be enough to find a mismatch, so the constant factor may be low enough to make this viable enough to be worthwhile for many smaller sets.
Annoyingly this requires a custom implementation of MultiMap, since all of the implementations in guava collections that would are public are final, and all that are abstract are package private, so there is no way to switch out the set implementation stored in the backing map. One could use a forwarding set... except that would cost an extra object (and which would equal the size of the data in a four element array (the overhead for an array is already there in the ArrayList sizes )).
Using trove THashSet and THashMap to build multimaps in MapPointer:
Keys | Values | Size | Scaled Size | Notes |
---|---|---|---|---|
Linked -> | <- Linked Set | 1,778,394,616 bytes | 1.000 | |
Unlinked | Linked HashSet | 1,753,989,672 bytes | 0.986 | |
Unlinked | Unlinked HashSet | 1,597,578,096 bytes | 0.898 | |
HashMap | THashSet (load=0.50) | 1,212,330,320 bytes | 0.682 | NEW |
HashMap | THashSet (load=0.85) | 1,000,578,712 bytes | 0.563 | NEW |
THashMap | THashSet (load=0.50) | 1,048,101,240 bytes | 0.589 | NEW |
THashMap | THashSet (load=0.75) | 946,178,464 bytes | 0.532 | NEW |
THashMap | THashSet (load=0.85) | 945,453,552 bytes | 0.532 | NEW |
Unlinked | Array List | 0,888,169,544 bytes | 0.499 | (Not a Set) |
Load factor 0.50 is pretty close to HashSet speed.
The performance for a load-factor 0.75 is not too bad (~8% slower than HashSet); the biggest performance issue is the lack of optimization for Axiom equals methods (e.g. OWLAxiomImpl::equals creates a new sets for annotation comparisons).
These measurements are for the internals size of a freshly loaded copy of NCBI Taxon (455MB in OWL FSS)
Cheers Simon. The OWLAxiomImpl bit I can improve right away. I'll have a look at using Trove as well.
I've got changes (including not copying the annotation impl which I just need to stash and apply to a different branch (the branch they're on is outdated, as I had to keep being equal all the other things).
Will do so in a bit.
Great work Simon!
Do you have any idea about, or ideas for performance tests for, how well axiom addition and removal performs with all of this? I remember playing about with array lists before, but adding and removing axioms was a lot slower than with the current implementation. Any thoughts on this?
THashMap + THashSet with load Factor of 0.50 performs similarly to HashSet for axiom addition.
Trove uses open address hashing, so removed entries still require extra probes when doing a lookup until cleared, but the main performance factor is the number of equality checks required.
THashMaps always have a prime number of slots, so there's a mod instead of a mask, but that's overwhelmed by the cost of the calls to equals().
Addition with duplicates to an ArrayList is obviously much faster than any set, but of course, has the slight problem of not having the right semantics :) Contains / add without duplicates / removal are O(n). Depending on the mix of operations, the ideal data structure might be a heap, given the low average cost of additions/removals. If contains checks are relatively rare then there is no problem with log(n) cost of lookups.
NB. I have been assuming that the typical usage pattern will have the bulk of all operations on MapPointers will come from an initial load of an ontology, followed by fetches of the sets associated with keys, which will primarily be accessed by iterators. Subsequent addition and removal of axioms would come in smaller batches. The webprotege usage analysis might shed more light on this.
In the shorter term, the THashMap/THashSet combination is a pretty easy win.
THashMap + ArrayList load time is just over half that of using a Set. I expect that the extra time is spent in comparisons. I will check the internals size for this combination, then put it aside for now.
How do we feel about back porting this to 3.5.1? There's an extra Trove dependency to add. I'd vote yes.
No problem with the extra dependency. Is it worth the work?
I don't think it's too much work. Regarding the improvements I've made, I haven't measured much so I don't know exactly. Tests run faster, that much is true.
o.k. sounds good to me.
A bit late, but another way to change the multimap implementation is to use Multimaps::newSetMultimap
protected Multimap<K, V> createNewSetMultimap() {
return Multimaps.newSetMultimap(new THashMap<>(), new Supplier<Set<V>>() {
@Override
public Set<V> get() {
return new HashSet<V>(2,(float)0.75);
}
});
}
That's a nice idiom, cheers. Might still use it in the MapPointer, and in the other places where we create multimaps.
I think this is already up to date in version 5. To double check and close.
Note: These dumps were taken before the changes to IRI and String Literals were made; this makes the relative overhead higher.
The use the guava collections LinkedHashMultiMap adds considerable overhead to indexes in internals.
For example,for ncbiTaxon, the overhead for the map at Internals.owlDatatypeReferences, with 3,243,648 entries, has a cost of 172,472,672 bytes. This is the retained size calculated by MAT, which goes not include space used by objects reachable from other places.
For reference: the retained size of the OWLOntologyImpl is 1,778,394,680.
For a dummy ontology impl, the retained size of an ArrayList of Axioms is 670,587,512
Every item stored in a LinkedHashMultiMap is quadruply linked; previous and next entries in order of adding to the map, and previous/next entries in the set of values for a particular key.
This level of navigability is probably not necessary; nothing internally or in the public interface makes use of the links at the map level.
For a possible small win, I will measure the effects of using a LinkedHashMap of LinkedHashSet (which is really a wrapped LinkedHashMap.
I will also measure baseline heap usage for plain HashMultiMap.