Error in DenseDataPoint::setFeatureValue(): feature (id=95) not found.

o19s / elasticsearch-learning-to-rank

Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch

http://opensourceconnections.com/blog/2017/02/14/elasticsearch-learning-to-rank/

Apache License 2.0

1.48k stars 369 forks source link

Error in DenseDataPoint::setFeatureValue(): feature (id=95) not found. #104

Closed besson closed 7 years ago

besson commented 7 years ago

I am getting this error when trying to load a LambdaMART model that does not use all features in my featureset. Describing in more details, I did the following steps:

Created a featureset of 371 features
Trained a LambdaMART model on Ranklib. This model uses 133 features (not all features of my feature set)
Loaded the model (output file generated by Ranklib)
Used the model in the sltr query and as a return I got the error Error in DenseDataPoint::setFeatureValue(): feature (id=95) not found.

softwaredoug commented 7 years ago

Thanks @besson -- Is there anyway you could create a simpler version that recreates the bug using test data we have access too? Such as the demo TMDB data set?

Also is there a stacktrace anywhere associated with this?

besson commented 7 years ago

Hi @softwaredoug ! It was not the same usage, but we had something similar with TMDB data. Using the demo scripts you provided:

Init default store and load features
Create a new feature. So, in this moment I had 3 features in my featureset
Generate a LambdaMART Model, this models used feature 1 and feature 3 only (not feature 2). I ran sltr query without problems
I kept this model and deleted my default store completly
I repeated steps 1 and 2 and created a new feature (a 4th one)
I loaded the model in 3 (old one)
When running sltr query I got the same error message

We don't have full access to log right now but we could retrieve were errors on classes: com.o19s.es.ltr.ranker.ranklib.DenseProgramaticDataPoint.setFeatureScore(DenseProgramaticDataPoint.java:62) ~[?:?] and org.elasticsearch.search.rescore.RescorePhase.execute(RescorePhase.java:49) ~[elasticsearch-5.5.2.jar:5.5.2]

thanks

softwaredoug commented 7 years ago

Sweet, thanks. That helps. So it does seem to be an issue with unused features in RankLib models.

softwaredoug commented 7 years ago

Can you show me features 1-3 you're using? I've been trying this with 3 simple match queries and haven't been able to recreate thus afr

besson commented 7 years ago

Sure. These are the features:

"features": [
        {
          "name": "1",
          "params": [
            "keywords"
          ],
          "template_language": "mustache",
          "template": {
            "match": {
              "title": "{{keywords}}"
            }
          }
        },
        {
          "name": "2",
          "params": [
            "keywords"
          ],
          "template_language": "mustache",
          "template": {
            "match": {
              "overview": "{{keywords}}"
            }
          }
        },
        {
          "name": "user_rating",
          "params": [],
          "template_language": "mustache",
          "template": {
            "function_score": {
              "functions": [
                {
                  "field_value_factor": {
                    "field": "user_rating",
                    "missing": 0
                  }
                }
              ],
              "query": {
                "match_all": {}
              }
            }
          }
        }
      ]

softwaredoug commented 7 years ago

Thanks for your patience @besson and collaborating on this! I have still had trouble, here's a script that I think attempts to perform the steps you mention. Perhaps you could try it out and modify it to recreate your issue https://github.com/o19s/elasticsearch-learning-to-rank/blob/104-hunt-for-bug/demo/do104.py#L83

softwaredoug commented 7 years ago

I built a bit of a chaos monkey test and seem to have recreated the bug!

softwaredoug commented 7 years ago

When the feature set has many features, but a Ranklib model only uses a few of them, we

Still run each scorer, as seen here
But we rely on the model to tell us how many features are possible, on this line

So if the model only uses feature 1, for example, we will have a data point of size 1. However we may score features 1...4. Then try to store all 4 features in an array of size 1. Hence the exception where we set an out of bounds feature value.

For Ranklib, we should always construct these vectors based on the number of features in the featureSet, and not rely on the model. And it'd be nice to not score unused features (unclear how common this is outside testing).

@nomoa do you concur? And would this be a problem for the xgboost or other model formats?

besson commented 7 years ago

Thanks for the dedication to investigate the problem @softwaredoug

I have a question, as models were designed to immutable, they have a copy of the features in the featureSet, right ? When you say you get features count from the model, are you getting those from this copy of featureSet ?

I am asking that, because in our real life case, the model listed, when you request the model from the API, all 371 features although it was using only 133. So, it should work as vector was initialized with size 371, right ?

softwaredoug commented 7 years ago

The issue is we parse a Ranklib model, and notice it only uses feature 1, when we should be looking at the associated feature set and seeing it has 1...n features. So when we execute the Lucene queries in the feature set, the model gets an out-of-bounds error when the feature set scores feature 2.

It's a good catch, as most of us have been using 100% of the features in the feature set, but it's a legitimate use case to only use a subset.

softwaredoug commented 7 years ago

This test recreates the issue

softwaredoug commented 7 years ago

I have a fix in this branch which seems to be working fine.

I would argue we still have work to do if you have a feature set of 300 features, but your model only uses 100, so we don't execute all 100 features.

I'm having problems with a caching test (at least locally) when I added my integration test so I asked @nomoa to check it out before I can work on merging it

besson commented 7 years ago

thank you very much @softwaredoug and @nomoa

softwaredoug commented 7 years ago

Sent you a slack invite @besson to our relevancy slack where we discuss this project and other topics :)

besson commented 7 years ago

cool! thank you :)

lonngxiang commented 3 years ago

This test recreates the issue

run do104.py ,there have a problem:

elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'failed to create query: java.lang.NullPointerException: Cannot invoke "com.o19s.es.ltr.feature.store.StoredFeatureSet.optimize()" because the return value of "com.o19s.es.ltr.feature.store.index.IndexFeatureStore.getAndParse(String, java.lang.Class, String)" is null')