Using Shared/HugeHashMap

flxthomaslo commented 10 years ago

looking into using shared or huge hash map within our application and got couple of questions:

1) We have our binary encoded data and we use flyweight to access it. If we are using huge hash map assuming we only access the data via a single thread, what is the best way to access the binary data without copying it (basically we only need the starting address of the value then we can map the flyweight to it)

2) Similar question on using shared map on using flyweight to read the information. I assume since we will be using multiple thread reading it we will need to do copy in

3) For the shared map is there a true locking (all read and write will block each other?)

4) If used for a long time is there a memory fragmentation we need to worry about?

5) Is it ok to have multiple shared map backed by multiple memory map files in terms of OS stability?

Thanks very much for such a great work!

peter-lawrey commented 10 years ago

We support off heap references. You can do this yourself by implementing the Byteable interface or you can use our generated classes by defining an interface of getter/setters/adders etc with nested classes and arrays.

SharedHashMap supports concurrent access across processes not just threads without the need for copying.

You can add entry based locking or use CAS or atomic add operations.

If you over commit the size, add lots of entries and then delete most of them, it wont compact the entries left. Ie they never move so you can hold references to them.

Without tuning the system you cannot have more than 32k files mapped at once. I suggest you have less than a hundred to keep things simple in terms of management etc. On 30/05/2014 9:29 PM, "flxthomaslo" notifications@github.com wrote:

looking into using shared or huge hash map within our application and got couple of questions:

1) We have our binary encoded data and we use flyweight to access it. If we are using huge hash map assuming we only access the data via a single thread, what is the best way to access the binary data without copying it (basically we only need the starting address of the value then we can map the flyweight to it)

2) Similar question on using shared map on using flyweight to read the information. I assume since we will be using multiple thread reading it we will need to do copy in

3) For the shared map is there a true locking (all read and write will block each other?)

4) If used for a long time is there a memory fragmentation we need to worry about?

5) Is it ok to have multiple shared map backed by multiple memory map files in terms of OS stability?

Thanks very much for such a great work!

— Reply to this email directly or view it on GitHub https://github.com/OpenHFT/HugeCollections/issues/27.

flxthomaslo commented 10 years ago

I probably won't use the generated classes since we have our own. Will take a look at byteable. For locking sounds like we can customize it (not sure how yet but will take a look). If we are only using it for read only I don't think I want to put a true lock in there to prevent other processes to read since for our use case it is acceptable to be eventually consistent (don't need to be atomic for such access). But then I will need to perform copy. So will need to see the trade off (read access time versus copy time). What type of tuning would you recommend? I think hundred should do.

peter-lawrey commented 10 years ago

Volatile and lazy set is also supported. You can use timestamp locking for optimistic locking. The cost of locking and unlocking a record is about 50 ns. If you use a timstamp lock it is more complex but with high read access the cost drops to about 10 ns. On 31/05/2014 4:02 PM, "flxthomaslo" notifications@github.com wrote:

I probably won't use the generated classes since we have our own. Will take a look at byteable. For locking sounds like we can customize it (not sure how yet but will take a look). If we are only using it for read only I don't think I want to put a true lock in there to prevent other processes to read since for our use case it is acceptable to be eventually consistent (don't need to be atomic for such access). But then I will need to perform copy. So will need to see the trade off (read access time versus copy time). What type of tuning would you recommend? I think hundred should do.

— Reply to this email directly or view it on GitHub https://github.com/OpenHFT/HugeCollections/issues/27#issuecomment-44750478 .

peter-lawrey / HugeCollections-OLD

Using Shared/HugeHashMap #27