Use of LinkedHashMultiMap in Internals is quite memory intensive.

sesuncedu commented 10 years ago

Note: These dumps were taken before the changes to IRI and String Literals were made; this makes the relative overhead higher.

The use the guava collections LinkedHashMultiMap adds considerable overhead to indexes in internals.

For example,for ncbiTaxon, the overhead for the map at Internals.owlDatatypeReferences, with 3,243,648 entries, has a cost of 172,472,672 bytes. This is the retained size calculated by MAT, which goes not include space used by objects reachable from other places.

For reference: the retained size of the OWLOntologyImpl is 1,778,394,680.

For a dummy ontology impl, the retained size of an ArrayList of Axioms is 670,587,512

Every item stored in a LinkedHashMultiMap is quadruply linked; previous and next entries in order of adding to the map, and previous/next entries in the set of values for a particular key.

This level of navigability is probably not necessary; nothing internally or in the public interface makes use of the links at the map level.

For a possible small win, I will measure the effects of using a LinkedHashMap of LinkedHashSet (which is really a wrapped LinkedHashMap.

I will also measure baseline heap usage for plain HashMultiMap.

sesuncedu commented 10 years ago

Switching code in MapPointer constructor from

        map = LinkedHashMultimap.create();

to

        map = MultimapBuilder.hashKeys().hashSetValues().build();

Reduces internals retained size from 1,778,394,616 bytes to 1,597,578,096 bytes.

sesuncedu commented 10 years ago

With MapPointer constructor

        map = MultimapBuilder.hashKeys().linkedHashSetValues().build();

internals retained size is 1,753,989,672 bytes

sesuncedu commented 10 years ago

With MapPointer constructor

        map = MultimapBuilder.hashKeys().arrayListValues().build();

internals retained size is 0,888,169,544 bytes

sesuncedu commented 10 years ago

Summary in table form:

Keys	Values	Size	Scaled Size
Linked ->	<- Linked Set	1,778,394,616 bytes	1.000
Unlinked	Linked HashSet	1,753,989,672 bytes	0.986
Unlinked	Unlinked HashSet	1,597,578,096 bytes	0.898
Unlinked	Array List	0,888,169,544 bytes	0.499

matthewhorridge commented 10 years ago

Nice work Simon. Array List looks great at first sight. I'm wondering how all of this plays out with write performance. Obviously, array list will be slower for writes, but I'm wondering if things perform o.k. in practice. Perhaps some kind of config option that would hint at optimising memory vs read vs write performance would be good.

sesuncedu commented 10 years ago

The array list approach does have the disadvantage that it doesn't remove duplicates until ten contains checks have been made (for a given fetch of values).

    public synchronized Set<V> getValues(K key) {
        init();
        return CollectionFactory.getCopyOnRequestSetFromMutableCollection(map
                .get(key));
    }

After the threshold has been exceeded, the collection gets converted to a LinkedHashSet, if it wasn't a set already.

This conversion takes place in the ConditionalCopySet , so it doesn't affect the value in the map. Any modifications to the returned set also trigger the COW.

Performance tradeoffs (time).

ArrayList : Fast add; Fast iteration; Containment check O(n) with small k. HashSet : Slower add, Slow iteration ; Containment check O(1) with medium k. LinkedHashSet : Slower add, Fast iteration, Containment check O(1) with medium k.

For maps whose entries will only contain a few values, ArrayList may perform ok for contains checks.

I'm looking to see if crafting a Trove TObjectHashSet (basically making a TObjectHashMap without any values) and seeing it it gives any useful wins .

https://bitbucket.org/robeden/trove/

ignazio1977 commented 10 years ago

Using lists also keeps the order of insertion, although this might not be an important requirement here.

An alternative restructuring that came to mind when Simon mentioned int based indexing is to reuse FaCT++ tricks with a master list of axioms and all indexes being sets of ints, which point at the main list records. At that point, we could use memory efficient collections of ints to hold the lot.

(or even arrays of ints, for that matter. For low variance collections, that would work quite well)

sesuncedu commented 10 years ago

That is part of the master plan :-)

Incidentally JFact could benefit from trove TIntInt collections (was comparing them that the other day)

Also, there are some great ways of encoding sets of ints using sparse bitmaps

ignazio1977 commented 10 years ago

I tried the bitmap encoding a while ago but could not manage any effective sparse representation. Maybe time to give it another go.

sesuncedu commented 10 years ago

I think there may be ways to improve density for some of ontologies with some usage patterns (could I be any modaler?).

The mapping from entity to int has a major effect.

See the analysis in

Daniel Lemire, Owen Kaser, Kamel Aouiche, Sorting improves word-aligned bitmap indexes. Data & >Knowledge Engineering 69 (1), pages 3-28, 2010. http://arxiv.org/abs/0901.3751

Implementation is here: https://github.com/lemire/javaewah

I believe that using a-Priori style frequent-item-set mining on sampled data to find groupings could give very good results- especially for a-boxes where there there are patterns of property usage / or values that are implicit in the data but not expressed in the ontology. That approach definitely wins for converting sets of triples into relational database tables.

ignazio1977 commented 10 years ago

I think there may be ways to improve density for some of ontologies with some usage patterns (could I be any modaler?).

On some virtual machines, for some users.

Sampling existing ontologies for patterns reminds me of a couple of PhD theses or more I've read in recent years, including Matthew's. Atomic decomposition might have something in common with this, and the notion of locality for modules might have to do with the notion of locality in memory for the ontology representation. Not an immediate winner in terms of performance, yet still I wonder.

ignazio1977 commented 10 years ago

Implementation is here: https://github.com/lemire/javaewah

Looks quite good.

sesuncedu commented 10 years ago

One idea that might work would be to have an Array backed implementation that remains as an unsorted list until a certain size threshold is reached, then switches over to a open-keyed hash table.

I have a strong suspicion that most entries in smaller tables will be unique, and that hash code inequality will be enough to find a mismatch, so the constant factor may be low enough to make this viable enough to be worthwhile for many smaller sets.

Annoyingly this requires a custom implementation of MultiMap, since all of the implementations in guava collections that would are public are final, and all that are abstract are package private, so there is no way to switch out the set implementation stored in the backing map. One could use a forwarding set... except that would cost an extra object (and which would equal the size of the data in a four element array (the overhead for an array is already there in the ArrayList sizes )).

sesuncedu commented 10 years ago

Using trove THashSet and THashMap to build multimaps in MapPointer:

Keys	Values	Size	Scaled Size	Notes
Linked ->	<- Linked Set	1,778,394,616 bytes	1.000
Unlinked	Linked HashSet	1,753,989,672 bytes	0.986
Unlinked	Unlinked HashSet	1,597,578,096 bytes	0.898
HashMap	THashSet (load=0.50)	1,212,330,320 bytes	0.682	NEW
HashMap	THashSet (load=0.85)	1,000,578,712 bytes	0.563	NEW
THashMap	THashSet (load=0.50)	1,048,101,240 bytes	0.589	NEW
THashMap	THashSet (load=0.75)	946,178,464 bytes	0.532	NEW
THashMap	THashSet (load=0.85)	945,453,552 bytes	0.532	NEW
Unlinked	Array List	0,888,169,544 bytes	0.499	(Not a Set)

Load factor 0.50 is pretty close to HashSet speed.
The performance for a load-factor 0.75 is not too bad (~8% slower than HashSet); the biggest performance issue is the lack of optimization for Axiom equals methods (e.g. OWLAxiomImpl::equals creates a new sets for annotation comparisons).

These measurements are for the internals size of a freshly loaded copy of NCBI Taxon (455MB in OWL FSS)

ignazio1977 commented 10 years ago

Cheers Simon. The OWLAxiomImpl bit I can improve right away. I'll have a look at using Trove as well.

sesuncedu commented 10 years ago

I've got changes (including not copying the annotation impl which I just need to stash and apply to a different branch (the branch they're on is outdated, as I had to keep being equal all the other things).

Will do so in a bit.

matthewhorridge commented 10 years ago

Great work Simon!

Do you have any idea about, or ideas for performance tests for, how well axiom addition and removal performs with all of this? I remember playing about with array lists before, but adding and removing axioms was a lot slower than with the current implementation. Any thoughts on this?

sesuncedu commented 10 years ago

THashMap + THashSet with load Factor of 0.50 performs similarly to HashSet for axiom addition.

Trove uses open address hashing, so removed entries still require extra probes when doing a lookup until cleared, but the main performance factor is the number of equality checks required.
THashMaps always have a prime number of slots, so there's a mod instead of a mask, but that's overwhelmed by the cost of the calls to equals().

Addition with duplicates to an ArrayList is obviously much faster than any set, but of course, has the slight problem of not having the right semantics :) Contains / add without duplicates / removal are O(n). Depending on the mix of operations, the ideal data structure might be a heap, given the low average cost of additions/removals. If contains checks are relatively rare then there is no problem with log(n) cost of lookups.

NB. I have been assuming that the typical usage pattern will have the bulk of all operations on MapPointers will come from an initial load of an ontology, followed by fetches of the sets associated with keys, which will primarily be accessed by iterators. Subsequent addition and removal of axioms would come in smaller batches. The webprotege usage analysis might shed more light on this.

In the shorter term, the THashMap/THashSet combination is a pretty easy win.

sesuncedu commented 10 years ago

THashMap + ArrayList load time is just over half that of using a Set. I expect that the extra time is spent in comparisons. I will check the internals size for this combination, then put it aside for now.

ignazio1977 commented 10 years ago

How do we feel about back porting this to 3.5.1? There's an extra Trove dependency to add. I'd vote yes.

matthewhorridge commented 10 years ago

No problem with the extra dependency. Is it worth the work?

ignazio1977 commented 10 years ago

I don't think it's too much work. Regarding the improvements I've made, I haven't measured much so I don't know exactly. Tests run faster, that much is true.

matthewhorridge commented 10 years ago

o.k. sounds good to me.

sesuncedu commented 10 years ago

A bit late, but another way to change the multimap implementation is to use Multimaps::newSetMultimap

 protected Multimap<K, V> createNewSetMultimap() {
        return Multimaps.newSetMultimap(new THashMap<>(), new Supplier<Set<V>>() {
            @Override
            public Set<V> get() {
                return new HashSet<V>(2,(float)0.75);
            }
        });
    }

ignazio1977 commented 10 years ago

That's a nice idiom, cheers. Might still use it in the MapPointer, and in the other places where we create multimaps.

ignazio1977 commented 9 years ago

I think this is already up to date in version 5. To double check and close.

owlcs / owlapi

Use of LinkedHashMultiMap in Internals is quite memory intensive. #246