gridgain / gridgain-old

268 stars 85 forks source link

Improve write-behind flushSize vs batchSize counting #80

Open ceefour opened 10 years ago

ceefour commented 10 years ago

With the following write-behind config :

<bean parent="cache-template">
    <property name="name" value="yagoLabel" />
    <property name="cacheMode" value="PARTITIONED" />
    <property name="atomicityMode" value="ATOMIC" />
    <property name="distributionMode" value="PARTITIONED_ONLY" />
    <property name="backups" value="1" />
    <property name="store">
        <bean class="id.ac.itb.ee.lskk.lumen.yago.YagoLabelCacheStore" />
    </property>
    <property name="writeBehindEnabled" value="true" />
    <property name="writeBehindFlushSize" value="10240" />
    <property name="writeBehindFlushFrequency" value="30000" />
    <property name="writeBehindBatchSize" value="10240" />
    <property name="swapEnabled" value="false" />
    <property name="evictionPolicy">
        <bean class="org.gridgain.grid.cache.eviction.lru.GridCacheLruEvictionPolicy">
            <property name="maxSize" value="100000" />
        </bean>
    </property>
</bean>

I get following behavior:

09:10:01.395 [main] INFO  i.a.i.e.l.l.yago.YagoLabelsToMongo - Inserted 680000 labels
09:10:03.626 [flusher-0-#41%null%] DEBUG i.a.i.e.l.l.yago.YagoLabelCacheStore - Upserted 10240 documents, inserted=0, modified=1652, upserted=8588
09:10:03.631 [flusher-0-#41%null%] DEBUG i.a.i.e.l.l.yago.YagoLabelCacheStore - Upserted 1 documents, inserted=0, modified=0, upserted=1
09:10:04.362 [main] INFO  i.a.i.e.l.l.yago.YagoLabelsToMongo - Inserted 690000 labels
09:10:06.565 [flusher-0-#41%null%] DEBUG i.a.i.e.l.l.yago.YagoLabelCacheStore - Upserted 10240 documents, inserted=0, modified=1666, upserted=8573
09:10:06.573 [flusher-0-#41%null%] DEBUG i.a.i.e.l.l.yago.YagoLabelCacheStore - Upserted 1 documents, inserted=0, modified=0, upserted=1
09:10:07.062 [main] INFO  i.a.i.e.l.l.yago.YagoLabelsToMongo - Inserted 700000 labels
09:10:13.044 [flusher-0-#41%null%] DEBUG i.a.i.e.l.l.yago.YagoLabelCacheStore - Upserted 10240 documents, inserted=0, modified=1748, upserted=8491
09:10:13.050 [flusher-0-#41%null%] DEBUG i.a.i.e.l.l.yago.YagoLabelCacheStore - Upserted 1 documents, inserted=0, modified=0, upserted=1
09:10:13.450 [main] INFO  i.a.i.e.l.l.yago.YagoLabelsToMongo - Inserted 710000 labels

So the pattern is, putAll 10240 entries, then putAll 1 entry, putAll 10240 entries again, then putAll 1 entry and so on.

Which isn't optimal, it should simply put 10240 entries consistently based on the config above. Note that the default write-behind values (flush 10240 and batch 512) exhibit similar behavior, i.e. during a flush, put a couple of 512 batches, then put 1.

My workaround is to set:

<property name="writeBehindFlushSize" value="10239" />

i.e. 1 less than the batch size, which gives expected behavior:

09:13:50.480 [flusher-0-#41%null%] DEBUG i.a.i.e.l.l.yago.YagoLabelCacheStore - Upserted 10240 documents, inserted=0, modified=552, upserted=9688
09:13:51.644 [main] INFO  i.a.i.e.l.l.yago.YagoLabelsToMongo - Inserted 210000 labels
09:13:53.471 [flusher-0-#41%null%] DEBUG i.a.i.e.l.l.yago.YagoLabelCacheStore - Upserted 10240 documents, inserted=0, modified=606, upserted=9633
09:13:54.501 [main] INFO  i.a.i.e.l.l.yago.YagoLabelsToMongo - Inserted 220000 labels
09:13:56.271 [flusher-0-#41%null%] DEBUG i.a.i.e.l.l.yago.YagoLabelCacheStore - Upserted 10240 documents, inserted=0, modified=612, upserted=9627
09:13:57.121 [main] INFO  i.a.i.e.l.l.yago.YagoLabelsToMongo - Inserted 230000 labels

but this isn't intuitive. When flush size is a multiple of batch size, then behavior should align.