opensearch-project / opensearch-learning-to-rank-base

Fork of https://github.com/o19s/elasticsearch-learning-to-rank to work with OpenSearch
Apache License 2.0
14 stars 12 forks source link

[BUG] NPE on SLTR query when feature set doesn't exist #25

Open noCharger opened 7 months ago

noCharger commented 7 months ago

What is the bug?

A clear and concise description of the bug.

NPE on SLTR query when feature set doesn't exist

How can one reproduce the bug?

Steps to reproduce the behavior.

  1. Create tmdb index

  2. Search on tmdb index with SLTR query

    POST tmdb/_search
    {
    "query": {
        "bool": {
            "filter": [
                {
                    "terms": {
                        "_id": ["7555", "1370", "1369"]
                    }
                },
                {
                    "sltr": {
                        "_name": "logged_featureset",
                        "featureset": "more_movie_features",
                        "params": {
                            "keywords": "rambo"
                        }
                }}
            ]
        }
    },
    "ext": {
        "ltr_log": {
            "log_specs": {
                "name": "log_entry1",
                "named_query": "logged_featureset"
            }
        }
    }
    }
{
  "error": {
    "root_cause": [
      {
        "type": "query_shard_exception",
        "reason": "failed to create query: java.lang.NullPointerException: Cannot invoke \"com.o19s.es.ltr.feature.store.StoredFeatureSet.optimize()\" because the return value of \"com.o19s.es.ltr.feature.store.index.IndexFeatureStore.getAndParse(String, java.lang.Class, String)\" is null",
        "index": "tmdb",
        "index_uuid": "Vq8x0os1RvqSmjCqBrKXqQ"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "tmdb",
        "node": "-vRjBvEwSIimAr9UL-3NGg",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to create query: java.lang.NullPointerException: Cannot invoke \"com.o19s.es.ltr.feature.store.StoredFeatureSet.optimize()\" because the return value of \"com.o19s.es.ltr.feature.store.index.IndexFeatureStore.getAndParse(String, java.lang.Class, String)\" is null",
          "index": "tmdb",
          "index_uuid": "Vq8x0os1RvqSmjCqBrKXqQ",
          "caused_by": {
            "type": "i_o_exception",
            "reason": "java.lang.NullPointerException: Cannot invoke \"com.o19s.es.ltr.feature.store.StoredFeatureSet.optimize()\" because the return value of \"com.o19s.es.ltr.feature.store.index.IndexFeatureStore.getAndParse(String, java.lang.Class, String)\" is null",
            "caused_by": {
              "type": "null_pointer_exception",
              "reason": "Cannot invoke \"com.o19s.es.ltr.feature.store.StoredFeatureSet.optimize()\" because the return value of \"com.o19s.es.ltr.feature.store.index.IndexFeatureStore.getAndParse(String, java.lang.Class, String)\" is null"
            }
          }
        }
      }
    ]
  },
  "status": 400
}

What is the expected behavior?

A clear and concise description of what you expected to happen.

https://github.com/opensearch-project/opensearch-learning-to-rank-base/blob/9477c522b76908ac3a4fd3c76ac0e1641af24d3e/src/main/java/com/o19s/es/ltr/feature/store/index/Caches.java#L136-L149

Info contains 1. cachekey is a feature or a feature set 2. provide error msg like "Feature (set) does not exist"

noCharger commented 6 months ago
[2023-12-13T02:37:03,401][DEBUG][o.o.a.s.TransportSearchAction] [da91b284ed8b1c1b21b4aa288bbf9072] #[org.opensearch.index.query.QueryShardException,java.io.IOException,java.lang.NullPointerException]#All shards failed for phase: [query]
[tmdb/xinVDIeHR7CzbCkTgWnoSg] QueryShardException[failed to create query: java.lang.NullPointerException: Cannot invoke "com.o19s.es.ltr.feature.store.StoredFeatureSet.optimize()" because the return value of "com.o19s.es.ltr.feature.store.index.IndexFeatureStore.getAndParse(String, java.lang.Class, String)" is null]; nested: IOException[java.lang.NullPointerException: Cannot invoke "com.o19s.es.ltr.feature.store.StoredFeatureSet.optimize()" because the return value of "com.o19s.es.ltr.feature.store.index.IndexFeatureStore.getAndParse(String, java.lang.Class, String)" is null]; nested: NullPointerException[Cannot invoke "com.o19s.es.ltr.feature.store.StoredFeatureSet.optimize()" because the return value of "com.o19s.es.ltr.feature.store.index.IndexFeatureStore.getAndParse(String, java.lang.Class, String)" is null];
        at org.opensearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:483)
        at org.opensearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:466)
        at org.opensearch.search.SearchService.parseSource(SearchService.java:1238)
        at org.opensearch.search.SearchService.createContext(SearchService.java:985)
        at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:593)
        at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:566)
        at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
        at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
        at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
        at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:917)
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.IOException: java.lang.NullPointerException: Cannot invoke "com.o19s.es.ltr.feature.store.StoredFeatureSet.optimize()" because the return value of "com.o19s.es.ltr.feature.store.index.IndexFeatureStore.getAndParse(String, java.lang.Class, String)" is null
        at com.o19s.es.ltr.feature.store.index.Caches.cacheLoad(Caches.java:147)
        at com.o19s.es.ltr.feature.store.index.Caches.loadFeatureSet(Caches.java:129)
        at com.o19s.es.ltr.feature.store.index.CachedFeatureStore.loadSet(CachedFeatureStore.java:51)
        at com.o19s.es.ltr.query.StoredLtrQueryBuilder.doToQuery(StoredLtrQueryBuilder.java:175)
        at com.o19s.es.ltr.query.StoredLtrQueryBuilder.doToQuery(StoredLtrQueryBuilder.java:51)
        at org.opensearch.index.query.AbstractQueryBuilder.toQuery(AbstractQueryBuilder.java:117)
        at org.opensearch.index.query.BoolQueryBuilder.addBooleanClauses(BoolQueryBuilder.java:346)
        at org.opensearch.index.query.BoolQueryBuilder.doToQuery(BoolQueryBuilder.java:329)
        at org.opensearch.index.query.AbstractQueryBuilder.toQuery(AbstractQueryBuilder.java:117)
        at org.opensearch.index.query.QueryShardContext.lambda$toQuery$3(QueryShardContext.java:467)
        at org.opensearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:479)
        ... 16 more
Caused by: java.lang.NullPointerException: Cannot invoke "com.o19s.es.ltr.feature.store.StoredFeatureSet.optimize()" because the return value of "com.o19s.es.ltr.feature.store.index.IndexFeatureStore.getAndParse(String, java.lang.Class, String)" is null
        at com.o19s.es.ltr.feature.store.index.IndexFeatureStore.loadSet(IndexFeatureStore.java:123)
        at com.o19s.es.ltr.feature.store.index.Caches.lambda$cacheLoad$7(Caches.java:140)
        at org.opensearch.common.cache.Cache.computeIfAbsent(Cache.java:461)
        at com.o19s.es.ltr.feature.store.index.Caches.cacheLoad(Caches.java:139)
        ... 26 more
noCharger commented 6 months ago

https://github.com/opensearch-project/opensearch-learning-to-rank-base/blob/9477c522b76908ac3a4fd3c76ac0e1641af24d3e/src/main/java/com/o19s/es/ltr/feature/store/index/IndexFeatureStore.java#L198

noCharger commented 6 months ago

Post some debugging logs here

[2023-12-13T16:11:24,846][INFO ][c.o.e.l.f.s.i.IndexFeatureStore] [88665a1d4119.ant.amazon.com] internalGet index: .ltrstore
[2023-12-13T16:11:24,847][INFO ][c.o.e.l.f.s.i.IndexFeatureStore] [88665a1d4119.ant.amazon.com] internalGet id: featureset-aaa_more_movie_features
[2023-12-13T16:11:24,849][INFO ][c.o.e.l.f.s.i.IndexFeatureStore] [88665a1d4119.ant.amazon.com] getAndParse(): {"_index":".ltrstore","_id":"featureset-aaa_more_movie_features","found":false}

Instead of return null, we should handle this more gracefully.

https://github.com/opensearch-project/opensearch-learning-to-rank-base/blob/9477c522b76908ac3a4fd3c76ac0e1641af24d3e/src/main/java/com/o19s/es/ltr/feature/store/index/IndexFeatureStore.java#L176-L199