peter-lawrey / HugeCollections-OLD

Huge Collections for Java using efficient off heap storage
273 stars 51 forks source link

SharedHashMap: iterating using keyset, valueset is very slow for larger tables #34

Closed RuedigerMoeller closed 10 years ago

RuedigerMoeller commented 10 years ago

I had a look and it seems internally complete temporary copies are built up. Unfortunately my map has 5 million entries.

Is there any way to iterate values or keys of a SharedHashMap in a way next entry is computed only whenever "next" is called on the iterator ?. NP with non-standard API as the JDK collection's idea of requiring a keyset/valueset to offer an iterator is not exactly a good design performance wise.

I would try to add this myself, however I am missing kind of a 20 line overview how the basic mechanics work (segments, entries, entrysize, free mem managements, compaction).

2cnd: I am currently using SharedHasmap from a single thread (so I don't need concurrency control). Is there a way to use HugeMap backed by a memory mapped file ? I am looking for a mmaped Huge Hashmap, don't need sharing 4 now ..

peter-lawrey commented 10 years ago

I agree that when iterating only one key and or value should exist at a time. We should be able to fix this. Also iterating keys shouldn't create a value and visa versa. On 22/06/2014 12:40 AM, "RuedigerMoeller" notifications@github.com wrote:

I had a look and it seems internally complete temporary copies are built up. Unfortunately my map contains 5 million entries.

Is there any way to iterate values or keys of a SharedHashMap in a way next entry is computed only whenever "next" is called on the iterator ?. NP with non-standard API as the JDK collection's idea of requiring a keyset/valueset to offer an iterator is not exactly a good design performance wise.

I would try to add this myself, however I am missing kind of a 20 line overview how the basic mechanics work (segments, entries, entrysize, free mem managements, compaction).

— Reply to this email directly or view it on GitHub https://github.com/OpenHFT/HugeCollections/issues/34.

RobAustin commented 10 years ago

We have just raised the following JIRA, to address this issue.

higherfrequencytrading.atlassian.net/browse/HCOLL-110

On 22 Jun 2014, at 00:40, RuedigerMoeller notifications@github.com wrote:

I had a look and it seems internally complete temporary copies are built up. Unfortunately my map contains 5 million entries.

Is there any way to iterate values or keys of a SharedHashMap in a way next entry is computed only whenever "next" is called on the iterator ?. NP with non-standard API as the JDK collection's idea of requiring a keyset/valueset to offer an iterator is not exactly a good design performance wise.

I would try to add this myself, however I am missing kind of a 20 line overview how the basic mechanics work (segments, entries, entrysize, free mem managements, compaction).

— Reply to this email directly or view it on GitHub.

RobAustin commented 10 years ago

I would try to add this myself, however I am missing kind of a 20 line overview how the basic mechanics work (segments, entries, entrysize, free mem managements, compaction).

@RuedigerMoeller if you are keen to make this change and are willing to contribute it back via a pull request, let me know. We could cover your questions on SHM via a one-one video conference ( say http://www.gotomeeting.com ), let me known if you would like me to arrange this and if so, what time/date best suits you London,UK working hours are best for me.

Alternatively we will fix this issue, possibly in the next few months.

RuedigerMoeller commented 10 years ago

Fine, I'll try to implement. @BoundedBuffer I'll try to contact you via Google Hangout. I am located at Frankfurt. I don't think the security infrastructure @ office is prepared to route video streaming :-), Also no webcam+mike installed on dev machines. Alternatively I'll ask stuff right in this issue, as i am frequently working from home/at night if this is ok for you.

RobAustin commented 10 years ago

Sure - google hangouts if fine. I'm not on line at the moment ( today ) , but I am usually free in the evenings.

You will have access to the jira in this thread, it's preferable if we added comments / questions to the jira, I am assuming that you will issue a pull request once fixed, if so please add the jira code HCOLL-110 to the start of the comments in the checkin.

Rob

On 22 Jun 2014, at 09:41, RuedigerMoeller notifications@github.com wrote:

Fine, I'll try to implement. @BoundedBuffer I'll try to contact you via Google Hangout. I am located at Frankfurt. I don't think the security infrastructure @ office is prepared to route video streaming :-), Also no webcam+mike installed on default fev machines. Alternatively I'll ask stuff right in this issue, as i am frequently working from home/at night if this is ok for you.

— Reply to this email directly or view it on GitHub.

peter-lawrey commented 10 years ago

I will be in Frankfurt this week. Would you like to meet after work? On 22/06/2014 9:41 AM, "RuedigerMoeller" notifications@github.com wrote:

Fine, I'll try to implement. @BoundedBuffer https://github.com/BoundedBuffer I'll try to contact you via Google Hangout. I am located at Frankfurt. I don't think the security infrastructure @ office is prepared to route video streaming :-), Also no webcam+mike installed on default fev machines. Alternatively I'll ask stuff right in this issue, as i am frequently working from home/at night if this is ok for you.

— Reply to this email directly or view it on GitHub https://github.com/OpenHFT/HugeCollections/issues/34#issuecomment-46775554 .

RuedigerMoeller commented 10 years ago

Now that's a good idea :) Let's move to mail to figure out details: i am at gmail dot com moru0011 -ruediger

RuedigerMoeller commented 10 years ago

I'll close this, as with correct configurations (use smallish segments), de/encoding dominates runtime. So iteration is a secondary issue. Iteration is problematic once you have big segments.