Some suggestions. - Githubissues

Being someone very concious of data structure and memory issues in my code 
(console game programmer for the last 20 years) I was quite happy to run 
across this library when I needed a starting point for an AVLTree 
implementation.  Given that, I have a couple suggestions which are in no 
way meant to be critical of what is here, basically just nit picky items 
and my dream list.  :)

1. Performance items:

a) Remove the "convert to DArray" calls and implement iterators.  This 
allows the conversion to DArray to be a "utility" function instead of 
embedded in each class.  (Making them more stand alone of course as a 
benefit.)  Additionally, sometimes you have a very rare case where you 
simply have to find a specific item(s) with data filtering not related to 
the key, which copying out to DArray then iterating wastes a lot more 
memory bandwidth than an iterator.  Having ineficient iteration is 
sometimes preferable in the cross cutting cases where these iterations are 
required.  I.e. use the most efficient solution for your 90% usage and use 
a less optimimal solution (iteration) for the remaining 10% unless they 
become a problem then consider double indexing etc.  (I'm a firm believer 
in the 90-10 rule, primary optimization is selecting the correct 
structure, the 10% remaining only gets corrected if it becomes a notable 
problem.)  I've already started this for the AVLTree though I'm munging it 
to fit into my game engine so other than logic it is basically a complete 
rewrite. :(  I can still supply the code if interrested.

b) Add a policy system into the structures.  This is definately a personal 
opinion but I believe that STL drops the ball by trying to "simplify" too 
much and I dislike libraries trying to second guess my intended usage 
behind the scenes.  In other words, STL assumes it can figure out if the 
item you give it to contain is POD or non-POD data and applies 
optimizations based on that behind your back, yet ignores some of the more 
easy optimizations completely.  Instead of this a policy based container 
is my prefered solution.

   Here is the basic template for my Array implementation which supports 
policies:

template<
    typename _Type,
    typename _Policy    =   CollectionPolicyDefault<_Type>,
    typename _Allocator =   CollectionAllocator<_Type, 
_Policy, cpl::memory::BlockAllocator> >

   The policy is defined as follows:

template< typename _Type >
struct CollectionPolicyDefault
{
    typedef _Type               tValue;

    enum    {kConstructDestroy  =   !Traits<tValue>::kIsPOD};
    enum    {kBinaryMove        =   Traits<tValue>::kIsPOD};
    enum    {kIsStable          =   true};
};

   This involves three very common optimizations to array (std::vector) 
which are often useful.  I'll go in reverse order in description:

kIsStable:  says that when erasing an item out of an array I can't cause a 
change in ordering.  When kIsStable is false though, I just move the last 
item into the newly free'd item and avoid moving all the items in between.

kBinaryMove: says that when moving an item around within the array I can 
just "::memcpy" it safely.  I.e. POD or non-POD data (and if not familiar, 
that basically just means it has no virtuals and contains nothing with 
virtuals, pointers are technically part of this but I separate it).

kConstructDestroy: is actually a rather goofy one, I was thinking I should 
rename it "kPointerOwnership" or something but I got used to this and 
haven't changed it.  Basically this just means that if I move things 
around in an array it tells the underlying algo's to either perform the 
extra code to call copy constructors or not.  This is a HUGE optimization 
in many cases, think of a simple C string wrapper, every time you move an 
item within the array you have to new up a new buffer, copy the data, 
delete the old buffer etc.  With a simple C string wrapper though, once 
the "array" owns the object simply moving the "pointer" to the C string 
around is all that is required, only when the item is actually removed 
from the array does it need a destructor called.

   Simply replacing std::vector with this system (it's pretty much a drop 
in replacement with a define unless someone overloads the allocator) I've 
managed to easilly get 2 and 3 time speed increases in several open source 
projects that rely heavilly on STL by modifying the policies as is 
appropriate to the data.

   Final argument for using policies.  There are probably other 
optimizations which could be applied to other structures and even my array 
implementation, the addition/removal of flags is transparent in my system 
since if I forgot to modify code default "tTraits" return least permissive 
results.  (i.e. always assume non-POD, always assume fixed order etc etc.)

c) Remove the "assumed" key behavior system.  I know it seems to make 
sense to automatically copy C strings but to make this system really 
general you can't make assumptions for the programmer such as copying C 
strings automatically.  An example case from one of my prior games is that 
I load in a big ass string table, this data is fixed in memory and never 
changes.  I then keyed each string via address into a table based on 
player input.  In crisscross currently that means that there would be 
copies of each string being made for no reason.  The address of the string 
was the key for UI reasons, the data was iterated on a separate thread 
(iterators again) and used to slowly remove keys as searches became more 
specific.  It was definately a goofy system but this is an example of 
where libraries which do too much screw the user by wasting memory/time 
when not required.

Other comments:

a) HashTables are definately something you should complete.  In games, 
where I have fixed data and sizes for many things, HashTables are among my 
favorite structures.  My entire resource acquisition system is based on 
hash tables for instance.  Everything is packed up and put in a big memory 
mapped file which contains a "HashTable" in the beginning of the file, but 
I still access based on directory/file pathing based on hashed values.

b) Get more hash types implemented eventually.  Rehash, linked list hash 
and array hash.  Rehash is best for fixed size things which are guaranteed 
to fit into the hash, usually small things.  Linked list is like 
RedBlackTree/AVLTree/SplayTree etc in terms of being the "general" hash 
solution.  Array hash is basically a linked list hash except that it is 
always built to place keys in one flat array and indexes into a 
flat "array" of data items (just add an "end" index to the hash entry, 
insert/delete time sucks but lookups are cache friendly which in multi-
core is a HUGE win).

c) I'd like to see most of the current items with exclusive key 
abilities.  I.e. fail on insert of duplicates.  (I.e. STL calls most of 
the current items "multi" sets.  Most of my usage assumes single key so 
I'd like error returns without having to call exists first.  A common case 
I have at work is getting a connect from an IP/Port combo I already have, 
I have to kick the old Ip/Port and assume the old socket is dead in this 
case.)

d) Allocators instead of new/delete.  I have 7 different memory subsystems 
for various reasons, unspecified new/delete in my code simply asserts.  
This is memory allocation taken to the extreme but is required to figure 
out which sub-system allocated what.  I.e. if the render system is 
allocating 128megs of memory I want to know that, including 
any "collection" items.

   Think that is everything in terms of my "wish list". :)  Again, not 
complaining but for my usage these are all fairly important items.  
Currently I'm using your AVLTree as a reference and reimplementing from 
near scratch in order to incorporate things.
Original issue reported on code.google.com by All...@gmail.com on 2 Apr 2008 at 1:23
skydevgit / crisscross

Some suggestions. #29