keithpage / betfairdataclient

Automatically exported from code.google.com/p/betfairdataclient
0 stars 0 forks source link

Performance tests for the internal id type #1

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
We should setupt some tests to decide which type to use as internal id.
I would suggest we test: Integer/BigInteger/Guid and String
1. Integer
2. BigInteger
3. Guid
4. String (would probably be a combination of values)

The test should be set up in the following manner:
1 and 2 and 3:
Create a threadsafe mechanism to retrieve a new Id.

1 and 2:
Make sure that we dont run out of ids at some point. The ones that are not
used anymore have to be put back into the pool.

4:
Doesn't need to be threadsafe since a proper combination of values would
make the id unique.

In the end we should have a comparison of the performance of these types.

Original issue reported on code.google.com by ntzioli...@googlemail.com on 8 Mar 2009 at 4:51

GoogleCodeExporter commented 9 years ago
My personal preference would be  a long value type for the internal id's. The 
ideal
id would require the absolute minimum cycles to create and read.

Original comment by dubdub1...@gmail.com on 8 Mar 2009 at 3:47

GoogleCodeExporter commented 9 years ago
Agreed. One thing an id should do is perform well.

Of course the id has to be unique as well, so the actions needed to ensure 
uniqueness
have to be included when doing perfromance tests.

When using int/long you need to ensure that an id cannot be used twice. Even if 
the
application crashes. Otherwise the system cannot work as designed anymore.
So what you need to do is to actually save the last id somewhere. Since you 
intend
not to use a db for the core this would mean writing to the filesystem in some
format. Remember this must be done at every id creation to persitently store 
the last
id to retrieve it in the event of an application/system crash. Now what if the 
crash
is caused by writing the id to the file system or the is some error with the
filesystem in general and the last id file is corrupted. How would one 
determine the
next id if the last used id is not known?
To get rid of that issue a mini db could be used to store the last id since dbs
handle filesystem issues themselves (most of them do anyway ;)). But the 
performance
penalty still applies.

With Guid you get uniqueness by design. But performance wise the pure 
generation of
the Guid is not nearly as good as simply incrementing an int/long. The 
difference is
most likely to be substancial!
But including the logic fielsystem writes/db calls) that need to be in place to
insure uniqueness for int/long, the overall performance of a Guid creation 
should be
better.

Regarding the differences in reading a long/Guid should be low (8 or 16 bytes) 
and
can be disregarded since the biggest issues (by far) when talking about reading 
are
string/text values.
Same applies to the amount of storage that is necessary to store the id. The id 
is
the minimal part of the "data" that has to be stored (in memory or otherwise).

Original comment by ntzioli...@googlemail.com on 8 Mar 2009 at 6:22

GoogleCodeExporter commented 9 years ago
To use GUID and the performance impact on the entire system even extended
implementations makes it use questionable. It would make more sense to have a 
queu
called "id" and a single field that stores last id. At start up you generate 
1000 IDs
and write number 1000 to the last id value on the db. if the queu drops to 200 
you
add 800 new ID to the que and update the last id value. This would ensure that 
your
ID value stays unique. A test would be best to do this with. I would say we 
test GUID
vs long in a database table with 100'000'000 rows too. 

I'm not completely against using GUID as an identifier but speen is more 
important.
The real question is why does internal IDs need to be unique over different 
sessions?
Any database or xml writes will be performed by a higher level where the impact 
of ID
generation is much less of an issue. The internal reference IDs sole purpose is 
to be
the most efficient, fastest internal lookup identifier possible, and starting 
from 0
will use the smallest memory blocks possible.

Original comment by dubdub1...@gmail.com on 8 Mar 2009 at 11:28