peter-lawrey / HugeCollections-OLD

Huge Collections for Java using efficient off heap storage
273 stars 51 forks source link

HugeHashMap: recyclable vs immutable key #7

Closed jaromirs closed 10 years ago

jaromirs commented 10 years ago

Current implementation of HugeHashMap handles CharSequence type specifically. It converts it into a string whenever it needs to use CharSequence as a key for standard java.util.HashMap (this is correct because for instance StringBuilder does not override equals and hashCode methods and needs to be converted into a string). Consequently the HugeHashMap creates a new string with every put (line 252, current version of HugeHashMap).

The proposed modification introduces a concept of reusable vs immutable key. For all operations on java.util.HashMap except put, we use reusable key, for put we MUST create an immutable copy of the reusable key.

3 additional comments:

  1. The API is not as nice as in the original solution but it helps to avoid creating unnecessary objects on the heap and makes sure that reusable key objects are not used as keys for java.util.HashMap.put operation.
  2. The proposed modifiation requires that the Key type implements BytesMarshallable interface. This might be too restrictive in some cases (?).
  3. I am not sure that all the changes will be accepted but in general I think it is a step in the right direction.
peter-lawrey commented 10 years ago

In SharedHashMap it doesn't need to do this and I will put this Pull Request on hold until that class is finished because I believe we can avoid the whole issue.

On 12 February 2014 16:02, Jaromir Satanek notifications@github.com wrote:

Current implementation of HugeHashMap handles CharSequence type specifically. It converts it into a string whenever it needs to use CharSequence as a key for standard java.util.HashMap (this is correct because for instance StringBuilder does not override equals and hashCode methods and needs to be converted into a string). Consequently the HugeHashMap creates a new string with every put (line 252, current version of HugeHashMap).

The proposed modification introduces a concept of reusable vs immutable key. For all operations on java.util.HashMap except put, we use reusable key, for put we MUST create an immutable copy of the reusable key.

3 additional comments:

  1. The API is not as nice as in the original solution but it helps to avoid creating unnecessary objects on the heap and makes sure that reusable key objects are not used as keys for java.util.HashMap.put operation.
  2. The proposed modifiation requires that the Key type implements BytesMarshallable interface. This might be too restrictive in some cases (?).
  3. I am not sure that all the changes will be accepted but in general I

think it is a step in the right direction.

You can merge this Pull Request by running

git pull https://github.com/jaromirs/HugeCollections master

Or view, comment on, or merge it at:

https://github.com/OpenHFT/HugeCollections/pull/7 Commit Summary

  • HugeHashMap modified so that every entry that cannot be stored into offheap memory and is stored into standard HashMap does not use recycled key object.
  • Cosmetic change in testPutLong unit test
  • Specific handling of CharSequence completely removed.

File Changes

  • M collections/src/main/java/net/openhft/collections/HugeHashMap.javahttps://github.com/OpenHFT/HugeCollections/pull/7/files#diff-0(84)
  • A collections/src/main/java/net/openhft/collections/HugeMapKey.javahttps://github.com/OpenHFT/HugeCollections/pull/7/files#diff-1(23)
  • A collections/src/test/java/net/openhft/collections/CharSequenceKey.javahttps://github.com/OpenHFT/HugeCollections/pull/7/files#diff-2(100)
  • M collections/src/test/java/net/openhft/collections/HugeHashMapTest.javahttps://github.com/OpenHFT/HugeCollections/pull/7/files#diff-3(47)
  • A collections/src/test/java/net/openhft/collections/IntKey.javahttps://github.com/OpenHFT/HugeCollections/pull/7/files#diff-4(44)
  • M collections/src/test/java/net/openhft/collections/SegmentTest.javahttps://github.com/OpenHFT/HugeCollections/pull/7/files#diff-5(104)

Patch Links:

Reply to this email directly or view it on GitHubhttps://github.com/OpenHFT/HugeCollections/pull/7 .

jaromirs commented 10 years ago

OK, I understand. Will SharedHashMap use memory mapped files? Will the content be automatically persisted?

peter-lawrey commented 10 years ago

Yes, and yes. SHM takes a different approach of comparing serialized forms. It also don't have the option of allocating more memory oversized objects or more objects. The fact you are using disk should mean you can make it much bigger with less cost. i.e. it can be bigger than main memory. I think once it is finished, I will rework HugeHM so it can avoid the need to toString() etc in the same manner. In any case it would be best if they behaved similarly.

On 13 February 2014 08:45, Jaromir Satanek notifications@github.com wrote:

OK, I understand. Will SharedHashMap use memory mapped files? Will the content be automatically persisted?

Reply to this email directly or view it on GitHubhttps://github.com/OpenHFT/HugeCollections/pull/7#issuecomment-34958484 .

peter-lawrey commented 10 years ago

I plan to add support for TCP replication as well.

On 13 February 2014 10:29, Peter Lawrey peter.lawrey@gmail.com wrote:

Yes, and yes. SHM takes a different approach of comparing serialized forms. It also don't have the option of allocating more memory oversized objects or more objects. The fact you are using disk should mean you can make it much bigger with less cost. i.e. it can be bigger than main memory. I think once it is finished, I will rework HugeHM so it can avoid the need to toString() etc in the same manner. In any case it would be best if they behaved similarly.

On 13 February 2014 08:45, Jaromir Satanek notifications@github.comwrote:

OK, I understand. Will SharedHashMap use memory mapped files? Will the content be automatically persisted?

Reply to this email directly or view it on GitHubhttps://github.com/OpenHFT/HugeCollections/pull/7#issuecomment-34958484 .

jaromirs commented 10 years ago

That sounds really promising. Just out of curiosity: when do you think it will be finished?

peter-lawrey commented 10 years ago

Finished is a strong word. I plan to have it code complete (without replication) by the end of next week.

On 13 February 2014 10:38, Jaromir Satanek notifications@github.com wrote:

That sounds really promising. Just out of curiosity: when do you think it will be finished?

Reply to this email directly or view it on GitHubhttps://github.com/OpenHFT/HugeCollections/pull/7#issuecomment-34966494 .

jaromirs commented 10 years ago

OK, I am looking forward to it. Will help you with testing.

On Thu, Feb 13, 2014 at 11:48 AM, Peter Lawrey notifications@github.comwrote:

Finished is a strong word. I plan to have it code complete (without replication) by the end of next week.

On 13 February 2014 10:38, Jaromir Satanek notifications@github.com wrote:

That sounds really promising. Just out of curiosity: when do you think it will be finished?

Reply to this email directly or view it on GitHub< https://github.com/OpenHFT/HugeCollections/pull/7#issuecomment-34966494>

.

Reply to this email directly or view it on GitHubhttps://github.com/OpenHFT/HugeCollections/pull/7#issuecomment-34967282 .

peter-lawrey commented 10 years ago

You can start with the test cases any time. You have been very helpful in that regard already.

On 13 February 2014 11:05, Jaromir Satanek notifications@github.com wrote:

OK, I am looking forward to it. Will help you with testing.

On Thu, Feb 13, 2014 at 11:48 AM, Peter Lawrey <notifications@github.com

wrote:

Finished is a strong word. I plan to have it code complete (without replication) by the end of next week.

On 13 February 2014 10:38, Jaromir Satanek notifications@github.com wrote:

That sounds really promising. Just out of curiosity: when do you think it will be finished?

Reply to this email directly or view it on GitHub< https://github.com/OpenHFT/HugeCollections/pull/7#issuecomment-34966494>

.

Reply to this email directly or view it on GitHub< https://github.com/OpenHFT/HugeCollections/pull/7#issuecomment-34967282>

.

Reply to this email directly or view it on GitHubhttps://github.com/OpenHFT/HugeCollections/pull/7#issuecomment-34968480 .

jaromirs commented 10 years ago

Hi Peter, I've found one more issue in IntIntMultiMap.remove method. You can see a test case and a fix in my latest commit (https://github.com/jaromirs/HugeCollections/commit/85bf1adfb44a2df56315b910b6db917a54499dbd).

peter-lawrey commented 10 years ago

HHM has been updated to use the serialized form of a key, instead of the key itself (Just as SharedHashMap does). Can you review this?

jaromirs commented 10 years ago

Hi Peter, sure - I'll take a look.

On Wed, Mar 26, 2014 at 8:52 AM, Peter Lawrey notifications@github.comwrote:

HHM has been updated to use the serialized form of a key, instead of the key itself (Just as SharedHashMap does). Can you review this?

Reply to this email directly or view it on GitHubhttps://github.com/OpenHFT/HugeCollections/pull/7#issuecomment-38657462 .

jaromirs commented 10 years ago

Hi Peter,

I've tested the latest version of HHM (HugeHashMapTest.testPut) and I can still see that the toString method is called for instance in Segment.put. Thus garbage is generated.

Jaromir

On Wed, Mar 26, 2014 at 10:36 AM, Jaromír Šatánek mira.satanek@gmail.comwrote:

Hi Peter, sure - I'll take a look.

On Wed, Mar 26, 2014 at 8:52 AM, Peter Lawrey notifications@github.comwrote:

HHM has been updated to use the serialized form of a key, instead of the key itself (Just as SharedHashMap does). Can you review this?

— Reply to this email directly or view it on GitHubhttps://github.com/OpenHFT/HugeCollections/pull/7#issuecomment-38657462 .

peter-lawrey commented 10 years ago

Thank you. I will have some check this. On 28 Mar 2014 11:46, "Jaromir Satanek" notifications@github.com wrote:

Hi Peter,

I've tested the latest version of HHM (HugeHashMapTest.testPut) and I can still see that the toString method is called for instance in Segment.put. Thus garbage is generated.

Jaromir

On Wed, Mar 26, 2014 at 10:36 AM, Jaromír Šatánek mira.satanek@gmail.comwrote:

Hi Peter, sure - I'll take a look.

On Wed, Mar 26, 2014 at 8:52 AM, Peter Lawrey notifications@github.comwrote:

HHM has been updated to use the serialized form of a key, instead of the key itself (Just as SharedHashMap does). Can you review this?

— Reply to this email directly or view it on GitHub< https://github.com/OpenHFT/HugeCollections/pull/7#issuecomment-38657462> .

— Reply to this email directly or view it on GitHubhttps://github.com/OpenHFT/HugeCollections/pull/7#issuecomment-38910904 .

peter-lawrey commented 10 years ago

Closing for now as the code is this area has been significantly altered to be similar to SharedHashMap