spring-projects / spring-boot-data-geode

Spring Boot support for Apache Geode and VMware GemFire
https://projects.spring.io/spring-boot
Apache License 2.0
46 stars 49 forks source link

PARTITION_PERSISTENT_OVERFLOW index slow #119

Closed janmaslik closed 12 months ago

janmaslik commented 2 years ago

Hello, this is going to be hard expain so i try to split description

SETUP im using indexes on the PARTITION_PERSISTENT_OVERFLOW region. For now with only one PARTITION service. This region have ~20GB persistent data on the disk Because of required logic i had to add field called evictKeys. This field is indexed and im using it when i try to search data which supposed to be evicted. The query searches by 'like and %'.

Simplyfied code:

@PostConstruct
  private void init()
  {
    wsPersistentLongRegion = cache.getRegion( "PERSISTENT_LONG " );
        cache.getQueryService().createIndex("evictIndex", "evictKeys", "/PERSISTENT_LONG " );
    persistentLongEvictTagContains = cache.getQueryService().newQuery("SELECT c.key FROM /PERSISTENT_LONG  c  WHERE c.evictKeys LIKE $1");
  }

@GemfireFunction( id = CloudFunctionId.EVICT_WS_TREE_PERSISTENT_LONG )
public void evictWsTreePersistentLong(String key)
{
  SelectResults<String> itemsToEvict = ( SelectResults<String>) persistentLongEvictTagContains.execute( "%"+ key + "%" );
  for(String key : itemsToEvict )
  {
       wsPersistentLongRegion .destroy(key);
  }
}

Object for the region:

public class StructuredCacheDTO implements ValueWrapper,Serializable
{

    private static final long serialVersionUID = 8059401949940358688L;

    @Id
    protected String key;

    @LuceneIndexed
    @Indexed
    protected String evictKeys;

    protected Object payload;

    @Override
    public Object get()
    {
        return payload;
    }
}

Region is created using GenericRegionFactoryBean in the spring configuration class

org.springframework.geode spring-geode 1.6.6 org.springframework.data spring-data-geode 2.6.3 org.springframework.session spring-session-data-geode 2.6.1

PROBLEM I think if persistent region contains data with indexed fields , the indexes are stored on the disk with rest of the data. Sometime it takes seconds to evict data. Even there is no record for that key. Getting warnings like this: WARN [[org.apache.geode.internal.monitoring.executor.AbstractExecutor]] (ThreadsMonitor) [] [[]] [] [] [[]] [] Thread <7897> (0x1ed9) that was executed at <09 Sep 2022 14:30:59 CEST> has been stuck for <46.102 seconds> and number of thread monitor iteration <1> Thread Name <Function Execution Processor159> state <RUNNABLE> Executor Group <FunctionExecutionPooledExecutor> Monitored metric <ResourceManagerStats.numThreadsStuck> Thread stack: java.lang.Thread.holdsLock(Native Method) org.apache.geode.internal.cache.VMCachedDeserializable.getDeserializedValue(VMCachedDeserializable.java:108) org.apache.geode.cache.query.internal.index.MemoryIndexStore.getTargetObject(MemoryIndexStore.java:500) org.apache.geode.cache.query.internal.index.MemoryIndexStore$MemoryIndexStoreEntry.setMemoryIndexStoreEntry(MemoryIndexStore.java:805) org.apache.geode.cache.query.internal.index.MemoryIndexStore$MemoryIndexStoreIterator.next(MemoryIndexStore.java:705) org.apache.geode.cache.query.internal.index.MemoryIndexStore$MemoryIndexStoreIterator.next(MemoryIndexStore.java:614) org.apache.geode.cache.query.internal.index.CompactRangeIndex.addToResultsFromEntries(CompactRangeIndex.java:779) org.apache.geode.cache.query.internal.index.CompactRangeIndex.addToResultsFromEntries(CompactRangeIndex.java:725) org.apache.geode.cache.query.internal.index.CompactRangeIndex.evaluate(CompactRangeIndex.java:626) org.apache.geode.cache.query.internal.index.CompactRangeIndex.lockedQueryPrivate(CompactRangeIndex.java:500) org.apache.geode.cache.query.internal.index.CompactRangeIndex.lockedQuery(CompactRangeIndex.java:1648) org.apache.geode.cache.query.internal.index.AbstractIndex.query(AbstractIndex.java:303) org.apache.geode.cache.query.internal.CompiledComparison.singleBaseCollectionFilterEvaluate(CompiledComparison.java:467) org.apache.geode.cache.query.internal.CompiledComparison.filterEvaluate(CompiledComparison.java:177) org.apache.geode.cache.query.internal.AbstractGroupOrRangeJunction.filterEvaluate(AbstractGroupOrRangeJunction.java:164) org.apache.geode.cache.query.internal.CompiledLike.filterEvaluate(CompiledLike.java:190) org.apache.geode.cache.query.internal.CompiledSelect.evaluate(CompiledSelect.java:538) org.apache.geode.cache.query.internal.CompiledSelect.evaluate(CompiledSelect.java:53) org.apache.geode.cache.query.internal.DefaultQuery.executeUsingContext(DefaultQuery.java:357) org.apache.geode.internal.cache.PRQueryProcessor.executeQueryOnBuckets(PRQueryProcessor.java:249) org.apache.geode.internal.cache.PRQueryProcessor.executeSequentially(PRQueryProcessor.java:215) org.apache.geode.internal.cache.PRQueryProcessor.executeQuery(PRQueryProcessor.java:124) org.apache.geode.internal.cache.PartitionedRegionQueryEvaluator.executeQueryOnLocalNode(PartitionedRegionQueryEvaluator.java:988) org.apache.geode.internal.cache.PartitionedRegionQueryEvaluator.executeQueryOnRemoteAndLocalNodes(PartitionedRegionQueryEvaluator.java:378) org.apache.geode.internal.cache.PartitionedRegionQueryEvaluator.queryBuckets(PartitionedRegionQueryEvaluator.java:495) org.apache.geode.internal.cache.PartitionedRegion.doExecuteQuery(PartitionedRegion.java:2102) org.apache.geode.internal.cache.PartitionedRegion.executeQuery(PartitionedRegion.java:2029) org.apache.geode.cache.query.internal.DefaultQuery.execute(DefaultQuery.java:241) com.shirtplatform.cloud.gemfire.service.EvictionService.evictWsTreePersistentLong(EvictionService.java:140) com.shirtplatform.cloud.gemfire.service.EvictionService.cloudEvict(EvictionService.java:118) sun.reflect.GeneratedMethodAccessor299.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:498) org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:282) org.springframework.data.gemfire.function.PojoFunctionWrapper.invokeTargetMethod(PojoFunctionWrapper.java:176) org.springframework.data.gemfire.function.PojoFunctionWrapper.execute(PojoFunctionWrapper.java:155) org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:201) org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:376) org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:441) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

Which leads that get data from the cache takes minutes.

WHY THIS WAY The cached result has several inputs. Im not able to rebuild all wanted keys at the evict event (one of the inputs changed )

I know that using index field + query with 'like' + huge persistent region is fine combo. You probably dont design the app this way(but its possible to do it :) ) . I just want to know if this is bug , or i just do something wrong.

jxblum commented 2 years ago

At first glance, I need more time to digest the information you shared and think about this problem more clearly. Although, I'll start with a few quick thoughts here:

1) GemFire/Geode (OQL) Indexes are never stored to disk. They only exist in memory, and therefore need to be rebuilt each and every time the server (i.e. the data server (vs PROXY server) / peer node in the cluster) is restarted.

2) If this is some type of bug (TBD), then it would be within GemFire/Geode's codebase itself, not Spring (SDG or SBDG).

However, I currently suspect, given there is ~20 GB of data stored on disk, that between the initial GII and the OQL Index reconstruction, it will take some time for the server to get things in order (e.g. load relevant data into memory) and reach optimal performance. This also hinges on the OQL Index being properly defined based on the OQL statement(s) used to query the data in question in the first place (I need to take a closer look at this and think more on it). Since data is overflowed to disk, there is always going to be a perf penalty if the query requires data not currently loaded into memory and only stored on disk. Maybe you need to do some warming exercises (pre-heat the oven so to speak), maybe run compaction (remember, disk files are append-only), etc, etc. This is why it is hard to make a determination at first glance and requires more thought. There are many other factors that could be at play here.

3) Finally, this type of question / problem is perhaps more appropriately asked in StackOverflow with the correct tags: gemfire, geode, spring-data-gemfire.

Will get back to this sometime next week.

Thank you for sharing this info.

janmaslik commented 2 years ago

Thank you for answer John.

Problem is starting after few days of application running. Therefore application should be completely ready and indexed. To get better picture : Cache data are images. So ~20GB region doesnt mean that there is a huge amount of small items.

Tonight i will update code to :

I let you know result of the changes in few days. And if not helps ,i create question in StackOverflow Thank you for help, John!

janmaslik commented 2 years ago

The question im aware of : uses my query indexes.? Because query with condition field like '%key%' usualy breaks some kind of tree indexes.

janmaslik commented 2 years ago

Also im thinking to use lucene index instead of normal index. Since i have multiple keys in one field named 'evictKeys' and im trying to search data to destroy by evictKeys like '%key%' . I guess Full text search should be more effective

janmaslik commented 2 years ago

I turned on <trace> on queries:

1st query $1 = key% , index is created same way how i created 'evictIndex' described above

Trace Info for Query: <trace> SELECT c.key FROM /persistentLong c WHERE c.key LIKE $1
Local 10.91.228.31(gemfire-service:27494)<ec><v148>:41000 took 15.189394ms and returned 0 results;

Query Executed in 18.926027 ms; rowCount = 0; indexesUsed(1):persistentLongKeyIndex(Results: 0) "<trace> SELECT c.key FROM /persistentLong c WHERE c.key LIKE $1" Query Executed in 18.926027 ms; rowCount = 0; indexesUsed(1):persistentLongKeyIndex(Results: 0) "<trace> SELECT c.key FROM /persistentLong c WHERE c.key LIKE $1"

2nd query $1 = %key% , creation of index described above Starting query: <trace> SELECT c.key FROM /persistentLong c WHERE c.evictKeys LIKE $1

Trace Info for Query: <trace> SELECT c.key FROM /persistentLong c  WHERE c.evictKeys LIKE $1
Local 10.91.228.31(gemfire-service:27494)<ec><v148>:41000 took 9096.46ms and returned 0 results;

Query Executed in 9097.421 ms; rowCount = 0; indexesUsed(1):evictIndex(Results: 0) "<trace> SELECT c.key FROM /persistentLong c WHERE c.evictKeys LIKE $1"

Its how i expected. Both indexes created same way. But i basically invalidated index by using '%' at the start of the searched value. Region contains 9.6 GB of persistent data. Im going to try lucene index

jxblum commented 12 months ago

Yes, wildcard (%) OQL queries are certainly going to have an effect on GemFire Indexes in this case. I am not sure what the outcome of your Lucene Index attempt was, but hopefully it yielded better results than when using an OQL Index and wildcard search.

In any case, that has everything to do with the GemFire/Geode OQL engine, along with applying Indexes to the right (searchable) data on the Object and engaging those Indexes in properly constructed OQL queries. Therefore, it has less to do with SBDG in general, which is just a facade to create Indexes and OQL queries. It is up to GemFire/Geode to decide how to execute them.

I am closing this ticket since it has been awhile and I am sure many things have changed since then. Feel free to file another ticket if necessary, but I encourage you to do so in the Spring Data for VMware GemFire GitHub project Issues.

Since VMware stopped sponsoring development (see here) on the Apache Geode project at ASF and entirely focused their efforts on VMware GemFire.

Good luck!