Being someone very concious of data structure and memory issues in my code
(console game programmer for the last 20 years) I was quite happy to run
across this library when I needed a starting point for an AVLTree
implementation. Given that, I have a couple suggestions which are in no
way meant to be critical of what is here, basically just nit picky items
and my dream list. :)
1. Performance items:
a) Remove the "convert to DArray" calls and implement iterators. This
allows the conversion to DArray to be a "utility" function instead of
embedded in each class. (Making them more stand alone of course as a
benefit.) Additionally, sometimes you have a very rare case where you
simply have to find a specific item(s) with data filtering not related to
the key, which copying out to DArray then iterating wastes a lot more
memory bandwidth than an iterator. Having ineficient iteration is
sometimes preferable in the cross cutting cases where these iterations are
required. I.e. use the most efficient solution for your 90% usage and use
a less optimimal solution (iteration) for the remaining 10% unless they
become a problem then consider double indexing etc. (I'm a firm believer
in the 90-10 rule, primary optimization is selecting the correct
structure, the 10% remaining only gets corrected if it becomes a notable
problem.) I've already started this for the AVLTree though I'm munging it
to fit into my game engine so other than logic it is basically a complete
rewrite. :( I can still supply the code if interrested.
b) Add a policy system into the structures. This is definately a personal
opinion but I believe that STL drops the ball by trying to "simplify" too
much and I dislike libraries trying to second guess my intended usage
behind the scenes. In other words, STL assumes it can figure out if the
item you give it to contain is POD or non-POD data and applies
optimizations based on that behind your back, yet ignores some of the more
easy optimizations completely. Instead of this a policy based container
is my prefered solution.
Here is the basic template for my Array implementation which supports
policies:
template<
typename _Type,
typename _Policy = CollectionPolicyDefault<_Type>,
typename _Allocator = CollectionAllocator<_Type,
_Policy, cpl::memory::BlockAllocator> >
The policy is defined as follows:
template< typename _Type >
struct CollectionPolicyDefault
{
typedef _Type tValue;
enum {kConstructDestroy = !Traits<tValue>::kIsPOD};
enum {kBinaryMove = Traits<tValue>::kIsPOD};
enum {kIsStable = true};
};
This involves three very common optimizations to array (std::vector)
which are often useful. I'll go in reverse order in description:
kIsStable: says that when erasing an item out of an array I can't cause a
change in ordering. When kIsStable is false though, I just move the last
item into the newly free'd item and avoid moving all the items in between.
kBinaryMove: says that when moving an item around within the array I can
just "::memcpy" it safely. I.e. POD or non-POD data (and if not familiar,
that basically just means it has no virtuals and contains nothing with
virtuals, pointers are technically part of this but I separate it).
kConstructDestroy: is actually a rather goofy one, I was thinking I should
rename it "kPointerOwnership" or something but I got used to this and
haven't changed it. Basically this just means that if I move things
around in an array it tells the underlying algo's to either perform the
extra code to call copy constructors or not. This is a HUGE optimization
in many cases, think of a simple C string wrapper, every time you move an
item within the array you have to new up a new buffer, copy the data,
delete the old buffer etc. With a simple C string wrapper though, once
the "array" owns the object simply moving the "pointer" to the C string
around is all that is required, only when the item is actually removed
from the array does it need a destructor called.
Simply replacing std::vector with this system (it's pretty much a drop
in replacement with a define unless someone overloads the allocator) I've
managed to easilly get 2 and 3 time speed increases in several open source
projects that rely heavilly on STL by modifying the policies as is
appropriate to the data.
Final argument for using policies. There are probably other
optimizations which could be applied to other structures and even my array
implementation, the addition/removal of flags is transparent in my system
since if I forgot to modify code default "tTraits" return least permissive
results. (i.e. always assume non-POD, always assume fixed order etc etc.)
c) Remove the "assumed" key behavior system. I know it seems to make
sense to automatically copy C strings but to make this system really
general you can't make assumptions for the programmer such as copying C
strings automatically. An example case from one of my prior games is that
I load in a big ass string table, this data is fixed in memory and never
changes. I then keyed each string via address into a table based on
player input. In crisscross currently that means that there would be
copies of each string being made for no reason. The address of the string
was the key for UI reasons, the data was iterated on a separate thread
(iterators again) and used to slowly remove keys as searches became more
specific. It was definately a goofy system but this is an example of
where libraries which do too much screw the user by wasting memory/time
when not required.
Other comments:
a) HashTables are definately something you should complete. In games,
where I have fixed data and sizes for many things, HashTables are among my
favorite structures. My entire resource acquisition system is based on
hash tables for instance. Everything is packed up and put in a big memory
mapped file which contains a "HashTable" in the beginning of the file, but
I still access based on directory/file pathing based on hashed values.
b) Get more hash types implemented eventually. Rehash, linked list hash
and array hash. Rehash is best for fixed size things which are guaranteed
to fit into the hash, usually small things. Linked list is like
RedBlackTree/AVLTree/SplayTree etc in terms of being the "general" hash
solution. Array hash is basically a linked list hash except that it is
always built to place keys in one flat array and indexes into a
flat "array" of data items (just add an "end" index to the hash entry,
insert/delete time sucks but lookups are cache friendly which in multi-
core is a HUGE win).
c) I'd like to see most of the current items with exclusive key
abilities. I.e. fail on insert of duplicates. (I.e. STL calls most of
the current items "multi" sets. Most of my usage assumes single key so
I'd like error returns without having to call exists first. A common case
I have at work is getting a connect from an IP/Port combo I already have,
I have to kick the old Ip/Port and assume the old socket is dead in this
case.)
d) Allocators instead of new/delete. I have 7 different memory subsystems
for various reasons, unspecified new/delete in my code simply asserts.
This is memory allocation taken to the extreme but is required to figure
out which sub-system allocated what. I.e. if the render system is
allocating 128megs of memory I want to know that, including
any "collection" items.
Think that is everything in terms of my "wish list". :) Again, not
complaining but for my usage these are all fairly important items.
Currently I'm using your AVLTree as a reference and reimplementing from
near scratch in order to incorporate things.
Original issue reported on code.google.com by All...@gmail.com on 2 Apr 2008 at 1:23
Original issue reported on code.google.com by
All...@gmail.com
on 2 Apr 2008 at 1:23