peter-lawrey / HugeCollections-OLD

Huge Collections for Java using efficient off heap storage
273 stars 51 forks source link

ChronicleMap / ChronicleMapBuilder - actualEntriesPerSegment restricted to minimum of 8 #44

Closed andycrutchlow closed 9 years ago

andycrutchlow commented 9 years ago

Note : referring to ChronicleMap / ChronicleMapBuilder but I'm actually using SharedHashMap and SharedHashMapBuilder. So my actual Issue is that setting actualEntriesPerSegment to a value less than 8 (i.e. 1 to 7) results in IllegalArgumentException .. Segment is full... Is this an intentional restriction? Seems to occur as a consequence of : long sizeOfBitSets() { return align64(entriesPerSegment / 8); }

I ask because I would like to be able to set actualEntriesPerSegment to 1. My reason for that is because, given the recent change which fixed a previous problem, concurrent reads/writes to different map entries that happen to be in the same segment are effectively queued/blocked behind eachother reducing throughput. So I think a solution would be to have each entry in its own segment. I appreciate that were I populating the map with millions of entries then a dedicated segment and associated Bytes object would probably not be practical. But in the use case i am exploring I will have 1000's or tens of 1000's of entries and the nature of the accessing and mutating of individual entries tends to be in bursts clustering around 'similar' key values whose hashing value seems to result in a high likelyhood they will have been located in same segment. I understand that I could further randomize keys to reduce this likelyhood...but i would rather not have to and continue using what are the natural keys..and also it will just reduce the likelyhood and not erradicate it.

RobAustin commented 9 years ago

we have raised the following issue to investigate this - https://higherfrequencytrading.atlassian.net/browse/HCOLL-135

peter-lawrey commented 9 years ago

While we routinely test a map with 1024 to 8192 segments, having more than this appears to make it slower rather than faster. YMMV.

The issue you are seeing is likely to be a result of not testing this scenario. Can you provide such a test to match your requirements?

Note: regardless of the number of threads you have, you only really have the number of CPUs you have running at once. Ideally, you should be spending a small portion of your time accessing a single Map, spending more of your time in business logic and whatever else your program is doing.

BTW, There is another, perhaps more extreme solution. You can create an off heap reference to all the values in the map and hold this reference. This works fine provided you don't delete any of these keys. You can add "row level" locking by adding a lock field to each entry (or possibly more than one lock field). This provide as fine grain control as you might need. Also, we support CAS and atomic add operations which avoid the need for locking at all.

On 13 September 2014 10:08, andycrutchlow notifications@github.com wrote:

Note : referring to ChronicleMap / ChronicleMapBuilder but I'm actually using SharedHashMap and SharedHashMapBuilder. So my actual Issue is that setting actualEntriesPerSegment to a value less than 8 (i.e. 1 to 7) results in IllegalArgumentException .. Segment is full... Is this an intentional restriction? Seems to occur as a consequence of : long sizeOfBitSets() { return align64(entriesPerSegment / 8); }

I ask because I would like to be able to set actualEntriesPerSegment to 1. My reason for that is because, given the recent change which fixed a previous problem, concurrent reads/writes to different map entries that happen to be in the same segment are effectively queued/blocked behind eachother reducing throughput. So I think a solution would be to have each entry in its own segment. I appreciate that were I populating the map with millions of entries then a dedicated segment and associated Bytes object would probably not be practical. But in the use case i am exploring I will have 1000's or tens of 1000's of entries and the nature of the accessing and mutating of individual entries tends to be in bursts clustering around 'similar' key values whose hashing value seems to result in a high likelyhood they will have been located in same segment. I understand that I could further randomize keys to reduce this likelyhood...but i would rather not have to and continue using what are the natural keys..and also it will just reduce the likelyhood and not erradicate it.

— Reply to this email directly or view it on GitHub https://github.com/OpenHFT/HugeCollections/issues/44.

andycrutchlow commented 9 years ago

For testcase its really just same code i provided before and you subsequently enhanced...

https://github.com/OpenHFT/HugeCollections/tree/master/collections/src/test/java/net/openhft/collections/dvgthreadbug

TestSharedHashMap.java

To save pasting in again...this is what I changed : shm = SharedHashMapBuilder.of(String.class, TestDataValue.class) .generatedValueType(true) .entrySize(1024) //Without this get "Segment is full or has no ranges of 2 continuous free blocks"...with it (trying to pad) get Segment is full, no free entries found .actualEntriesPerSegment(1) .file(new File("/run/shm/TestOpenHFT")) .create();

We are in danger of getting into indepth architectural discussion..but anyway here goes.. I am delegating thread management/scheduling to akka which i trust will do a reasonable job of optimizing threads to cpu's. Individual actors operate on their respective and exclusive chunk of shared memory i.e. an entry hosted in sharedhashmap. Each entry is a fairly complex object, pre-allocated, ready for all computation and with placeholders for all data that an actor will need during its lifetime whilst performing business logic..so all offheap..no ongoing object creation performed by actor...in effect i am doing as per your suggested extreme solution and holding on to references hosted in the map...once steady state reached there should be very little getting or aquiring through map..and there will be no deleting of entries. Other dependent actors in JVM can read from parent actors entry i.e. access it thru sharedmap (retaining references in the process) and using appropriate read-lock barriers, triggered by actor-to-actor messages. Other external processes can similary read from sharedmem via lookup and 'harvest' the current state of an actors data. Problem is at a top level events/messages tend to arrive in bursts clustered around similar keys..for example.. in FX market data typically when a SPOT rate message arrives so do related outright rate messages arrive e.g.1week, 2week,1Month etc .. and when this happens each update/execution of business logic performed (reading/writing into its exclusive sharedmemory entry) by an actor dedicated to a specfic rate , whilst autonomous, can be held up/blocked just because another actor's entry happens to be in same segment and it to recently received an event to trigger business logic and, assuming there is a spare cpu inconjunction with akka thread scheduling, means its running concurrently.

andycrutchlow commented 9 years ago

To return to the actual issue as raised : setting SharedHashMapBuilder.actualEntriesPerSegment(1) results in exceptions being thrown as soon as map is added to or accessed e.g. aquireUsing called. Not sure if I made that totally clear in first description.

Is it possible to know the likelyhood that this issue will be addressed at some point in the future or indeed if it is seen as an issue at all. There is no rush or urgency to this...I'd just like to know sooner rather than later and if not then I'll proceed down the path of me randomizing keys before supplying to map to get a better spread for my (domain specific) set of keys so that they are spread throughout the segments to get better read/write concurrency into and out of existing map entries.

Alternatively, perhaps there is an alternate way of achieving number of seqments broadly equal to number of entries and segments containing few (ideally 1) entries through existing parameters available in SharedHashMapBuilder. I have no issue with predetermining and fixing this upfront as in my case i can predetermine valid key combinations...number of entries etc.

peter-lawrey commented 9 years ago

Is it possible to know the likelyhood that this issue will be addressed at some point in the future or indeed if it is seen as an issue at all.

I see it as an issue and it should be addressed. At a minimum it should reject the option if it cannot be supported.

The problem with an actual entries of 1 is that you can only have 1 entry per segment. If you attempt to add a second entry which happens to fall into the same segment it will fail.

You are better off making the number of segments a power of two around the number of entries. This way each segment will have typically 1 entry, but should there be more it won't fail.

SharedHashMap is designed to be a pay-for-what-you-eat data structure. It tries to only use as much CPU cache or main memory as you need and no more. e.g. if you create a 1 TB volume on Linux but only use 1 GB, it will use a little over 1 GB of disk and memory.

On 16 September 2014 08:13, andycrutchlow notifications@github.com wrote:

To return to the actual issue as raised : setting SharedHashMapBuilder.actualEntriesPerSegment(1) results in exceptions being thrown as soon as map is added to or accessed e.g. aquireUsing called. Not sure if I made that totally clear in first description.

Is it possible to know the likelyhood that this issue will be addressed at some point in the future or indeed if it is seen as an issue at all. There is no rush or urgency to this...I'd just like to know sooner rather than later and if not then I'll proceed down the path of me randomizing keys before supplying to map to get a better spread for my (domain specific) set of keys so that they are spread throughout the segments to get better read/write concurrency into and out of existing map entries.

Alternatively, perhaps there is an alternate way of achieving number of seqments broadly equal to number of entries and segments containing few (ideally 1) entries through existing parameters available in SharedHashMapBuilder. I have no issue with predetermining and fixing this upfront as in my case i can predetermine valid key combinations...number of entries etc.

— Reply to this email directly or view it on GitHub https://github.com/OpenHFT/HugeCollections/issues/44#issuecomment-55739743 .

leventov commented 9 years ago

@andycrutchlow eventually some work is done related to this and now any number of either actualSegments or actualEntriesPerSegment, even non-power-of-2, is expected to be used precisely. Check out the next Chronicle Map alpha release (or wait for 2.0 final release)

andycrutchlow commented 9 years ago

Just got round to trying this.. I have got the latest Java-Lang and HugeCollections.

With the following setup :

SharedHashMap<String, PricingStackInterface> shm = SharedHashMapBuilder.of(String.class, PricingStackInterface.class) .generatedValueType(true) .entries(2048) .entrySize(13020) .actualEntriesPerSegment(1) //HERE //.actualSegments(2048) .file(new File(sharedMemoryPath)) .create();

First time I add an entry I get the exception below.

Caused by: java.lang.IllegalArgumentException: Segment is full or has no ranges of 2 continuous free blocks at net.openhft.collections.AbstractVanillaSharedHashMap$Segment.alloc(VanillaSharedHashMap.java:971) at net.openhft.collections.AbstractVanillaSharedHashMap$Segment.putEntry(VanillaSharedHashMap.java:928) at net.openhft.collections.AbstractVanillaSharedHashMap$Segment.acquire(VanillaSharedHashMap.java:798) at net.openhft.collections.AbstractVanillaSharedHashMap.lookupUsing(VanillaSharedHashMap.java:421) at net.openhft.collections.AbstractVanillaSharedHashMap.acquireUsing(VanillaSharedHashMap.java:410)

Ideally I want to configure so I have 1 entry per segment. My entry size if 13020 bytes and I want 2048 entries.

Am I using incorrect settings to achieve this?

Should I be using Chronicle Map ? I seem to recall that i'd have to migrate to java 1.8 and ideally i want to stay 1.7 for now.

On Sun, Oct 26, 2014 at 5:19 PM, Roman Leventov notifications@github.com wrote:

@andycrutchlow https://github.com/andycrutchlow eventually some work is done related to this and now any number of either actualSegments or actualEntriesPerSegment, even non-power-of-2, is expected to be used precisely. Check out the next Chronicle Map alpha release (or wait for 2.0 final release)

— Reply to this email directly or view it on GitHub https://github.com/OpenHFT/HugeCollections/issues/44#issuecomment-60532771 .

leventov commented 9 years ago

Yes, you should use ChronicleMap, HugeCollections are not updated anymore.

No, ChronicleMap doesn't require Java 8, it works with Java 7.

1 entry per segment is pointless. We cannot force entries to be hold in different segments, they will cluster randomly to 2-3-4 entries per segment, others will be empty. Any entries per segment config < 50-100 doesn't actually work, I think. Why you think you need 1 entry per segment?

Also, I wouldn't be so confident about entry size, if both key and value sizes are constant, use constantKeySizeBySample() and constantValueSizeBySample().

RobAustin commented 9 years ago

I suggest you use the last released version, as we are in the middle of refactoring our API at the moment and the snapshot is likely to be in flux over the next few days, the documentation of the last released version is here.

https://github.com/OpenHFT/Chronicle-Map/blob/chronicle-map-2.0.5a/README.md https://github.com/OpenHFT/Chronicle-Map/tree/chronicle-map-2.0.5a

On 6 Nov 2014, at 03:21, andycrutchlow notifications@github.com wrote:

Just got round to trying this.. I have got the latest Java-Lang and HugeCollections.

With the following setup :

SharedHashMap<String, PricingStackInterface> shm = SharedHashMapBuilder.of(String.class, PricingStackInterface.class) .generatedValueType(true) .entries(2048) .entrySize(13020) .actualEntriesPerSegment(1) //HERE //.actualSegments(2048) .file(new File(sharedMemoryPath)) .create();

First time I add an entry I get the exception below.

Caused by: java.lang.IllegalArgumentException: Segment is full or has no ranges of 2 continuous free blocks at net.openhft.collections.AbstractVanillaSharedHashMap$Segment.alloc(VanillaSharedHashMap.java:971) at net.openhft.collections.AbstractVanillaSharedHashMap$Segment.putEntry(VanillaSharedHashMap.java:928) at net.openhft.collections.AbstractVanillaSharedHashMap$Segment.acquire(VanillaSharedHashMap.java:798) at net.openhft.collections.AbstractVanillaSharedHashMap.lookupUsing(VanillaSharedHashMap.java:421) at net.openhft.collections.AbstractVanillaSharedHashMap.acquireUsing(VanillaSharedHashMap.java:410)

Ideally I want to configure so I have 1 entry per segment. My entry size if 13020 bytes and I want 2048 entries.

Am I using incorrect settings to achieve this?

Should I be using Chronicle Map ? I seem to recall that i'd have to migrate to java 1.8 and ideally i want to stay 1.7 for now.

On Sun, Oct 26, 2014 at 5:19 PM, Roman Leventov notifications@github.com wrote:

@andycrutchlow https://github.com/andycrutchlow eventually some work is done related to this and now any number of either actualSegments or actualEntriesPerSegment, even non-power-of-2, is expected to be used precisely. Check out the next Chronicle Map alpha release (or wait for 2.0 final release)

— Reply to this email directly or view it on GitHub https://github.com/OpenHFT/HugeCollections/issues/44#issuecomment-60532771 .

— Reply to this email directly or view it on GitHub https://github.com/OpenHFT/HugeCollections/issues/44#issuecomment-61921868.

peter-lawrey commented 9 years ago

At the risk of repeating what has been said already...

Segment is full or has no ranges of 2 continuous free blocks

The problem is that while you have specified an entrySize(13020) you are writing an entry which is bit bigger than this. This is why it is reporting it needs "2 continuous free blocks" As you are trying to keep the number of entries per segment to a minimum, the segment is filling up fast.

I seem to recall that i'd have to migrate to java 1.8 and ideally i want to stay 1.7 for now.

We support Java 7 best. Java 8 will still work but we don't optimise for lambdas or streams yet.

I suggest try Chronicle Map 2.0.5a. The latest version of Chronicle Map supports concurrent readers meaning there is less advantage in having more segments

On 6 November 2014 03:21, andycrutchlow notifications@github.com wrote:

Just got round to trying this.. I have got the latest Java-Lang and HugeCollections.

With the following setup :

SharedHashMap<String, PricingStackInterface> shm = SharedHashMapBuilder.of(String.class, PricingStackInterface.class) .generatedValueType(true) .entries(2048) .entrySize(13020) .actualEntriesPerSegment(1) //HERE //.actualSegments(2048) .file(new File(sharedMemoryPath)) .create();

First time I add an entry I get the exception below.

Caused by: java.lang.IllegalArgumentException: Segment is full or has no ranges of 2 continuous free blocks at net.openhft.collections.AbstractVanillaSharedHashMap$Segment.alloc(VanillaSharedHashMap.java:971)

at net.openhft.collections.AbstractVanillaSharedHashMap$Segment.putEntry(VanillaSharedHashMap.java:928)

at net.openhft.collections.AbstractVanillaSharedHashMap$Segment.acquire(VanillaSharedHashMap.java:798)

at net.openhft.collections.AbstractVanillaSharedHashMap.lookupUsing(VanillaSharedHashMap.java:421)

at net.openhft.collections.AbstractVanillaSharedHashMap.acquireUsing(VanillaSharedHashMap.java:410)

Ideally I want to configure so I have 1 entry per segment. My entry size if 13020 bytes and I want 2048 entries.

Am I using incorrect settings to achieve this?

Should I be using Chronicle Map ? I seem to recall that i'd have to migrate to java 1.8 and ideally i want to stay 1.7 for now.

On Sun, Oct 26, 2014 at 5:19 PM, Roman Leventov notifications@github.com

wrote:

@andycrutchlow https://github.com/andycrutchlow eventually some work is done related to this and now any number of either actualSegments or actualEntriesPerSegment, even non-power-of-2, is expected to be used precisely. Check out the next Chronicle Map alpha release (or wait for 2.0 final release)

— Reply to this email directly or view it on GitHub < https://github.com/OpenHFT/HugeCollections/issues/44#issuecomment-60532771>

.

— Reply to this email directly or view it on GitHub https://github.com/OpenHFT/HugeCollections/issues/44#issuecomment-61921868 .

andycrutchlow commented 9 years ago

The reason i want 1 entry per segment is that the read/write concurrency is restricted to per segment i think. By virtue of the natural keys I'm using i seem to get high likelyhood of entries that i wish read/write to at same time occupying same segment so not good throughput. On Nov 5, 2014 10:31 PM, "Roman Leventov" notifications@github.com wrote:

Yes, you should use ChronicleMap, HugeCollections are not updated anymore.

No, ChronicleMap doesn't require Java 8, it works with Java 7.

1 entry per segment is pointless. We cannot force entries to be hold in different segments, they will cluster randomly to 2-3-4 entries per segment, others will be empty. Any entries per segment config < 50-100 doesn't actually work, I think. Why you think you need 1 entry per segment?

Also, I wouldn't be so confident about entry size, if both key and value sizes are constant, use constantKeySizeBySample() and constantValueSizeBySample().

— Reply to this email directly or view it on GitHub https://github.com/OpenHFT/HugeCollections/issues/44#issuecomment-61922419 .

andycrutchlow commented 9 years ago

So peter..are you saying that the error is actually that the segment is not big enough to hold just 1 entry at the size i have specified...so if i pad it a bit more it will work? On Nov 6, 2014 6:52 AM, "Andrew Crutchlow" andycrutchlow@gmail.com wrote:

The reason i want 1 entry per segment is that the read/write concurrency is restricted to per segment i think. By virtue of the natural keys I'm using i seem to get high likelyhood of entries that i wish read/write to at same time occupying same segment so not good throughput. On Nov 5, 2014 10:31 PM, "Roman Leventov" notifications@github.com wrote:

Yes, you should use ChronicleMap, HugeCollections are not updated anymore.

No, ChronicleMap doesn't require Java 8, it works with Java 7.

1 entry per segment is pointless. We cannot force entries to be hold in different segments, they will cluster randomly to 2-3-4 entries per segment, others will be empty. Any entries per segment config < 50-100 doesn't actually work, I think. Why you think you need 1 entry per segment?

Also, I wouldn't be so confident about entry size, if both key and value sizes are constant, use constantKeySizeBySample() and constantValueSizeBySample().

— Reply to this email directly or view it on GitHub https://github.com/OpenHFT/HugeCollections/issues/44#issuecomment-61922419 .

leventov commented 9 years ago

There is no good way to implement what you are asking for with HugeCollections, really. You can specify entriesPerSegment(10) and have x10 memory overuse, but still won't be sure that due to randomness outbreak you won't have 11 entries clustered to the single segment, causing an error. Also, as Peter noted your actual entrySize is not 13020.

ChronicleMap now have read-write locks, which are potentially very benefitable if you have highly-concurrent read access. Also remember that your number of segments typically shouldn't be larger than doubled number of concurrent accessor threads. So if you want about 1000 segments, you should have about 500 concurrent accessors. Do you really have?