opensearch-project / neural-search

Plugin that adds dense neural retrieval into the OpenSearch ecosytem
Apache License 2.0
63 stars 66 forks source link

[Feature] Support Aggregations with Hybrid Query #422

Closed mayank-unscrambl closed 11 months ago

mayank-unscrambl commented 1 year ago

Describe the bug?

When performing a hybrid query along with bucket aggregation, the aggregation field in the response is empty but if the same query is converted to bool format, it gives proper response. Tried with both neural query and text query, gives empty aggregations in both.

How can one reproduce the bug?

  1. Create a new index with atleast one keyword field
  2. Index 3 documents
  3. Add a normalization processor
  4. Perform a hybrid search with term aggregation
  5. See aggregation field in response

What is the expected behavior?

Bucket aggregation is returned for the documents.

Screenshots

Create Index Request

PUT /home-depot-nlp
{
  "settings": {
    "index.knn": true,
    "default_pipeline": "home-depot-pipeline",
    "number_of_shards": 2,
    "number_of_replicas": 2
  },
  "mappings": {
    "properties": {
      "description_embedding": {
        "type": "knn_vector",
        "dimension": 384,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "nmslib",
          "parameters": {
            "ef_construction": 128,
            "m": 24
          }
        }
      }
      },
      "description": { "type": "text" },
      "brand": { "type": "keyword" },
    }
  }
}

Create normalization processor /_search/pipeline/home-depot-search-pipeline

{
  "description": "Post processor for hybrid search",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": {
            "weights": [
              0.5,
              0.5
            ]
          }
        }
      }
    }
  ]
}

Sample Document

{
          "index": "117",
          "description": "BEHR PREMIUM PLUS Exterior Paint & Primer is a 100% Acrylic, low VOC formula designed for a long-lasting finish that resists moisture, fading & stains and provides a mildew and corrosion resistant finish. It delivers exceptional hide and excellent touch-up while also providing comprehensive all-climate protection.",
          "brand": "BEHR PREMIUM PLUS"
        }

Sample query with params (search_pipeline=home-depot-search-pipeline)

{
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "description": {
              "query": "home decor"
            }
          }
        },
        {
          "match": {
            "description": {
              "query": "tables"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "response_codes": {
      "terms": {
        "field": "brand"
      }
    }
  },
  "_source": {
    "excludes": [
      "description_embedding"
    ]
  }
}

Response:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 0.5,
    "hits": [
      {
        "_index": "home-depot-nlp",
        "_id": "2536",
        "_score": 0.5,
        "_source": {
          "description": "Add a fine touch of elegance in your modern inspired living space with this modern style round suar side table with stainless steel legs. This side table is in a category of its own since it is made from suar wood and stainless steel. The beautiful natural dark brown finish of the table surface is crafted from wood while the stable legs it sits in is forged from stainless steel. The unique sleek and low-profile design of the legs and the brown accent of the table surface will make an excellent addition to your living room space in your modern home. This item comes shipped in 1 carton. Suitable for indoor use only. Maximum weight limit is 100 lbs. Made in Indonesia. Contemporary design.",
          "brand": "Litton Lane"
        }
      }
    ]
  },
  "aggregations": {
    "response_codes": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": []
    }
  }
}

Query with bool clause

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": {
              "query": "home decor"
            }
          }
        },
        {
          "match": {
            "description": {
              "query": "tables"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "response_codes": {
      "terms": {
        "field": "brand",
        "size": 10
      }
    }
  },
  "_source": {
    "includes": [
      "description",
      "brand"
    ]
  }
}

Response for query with bool clause

{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 66,
      "relation": "eq"
    },
    "max_score": 8.623452,
    "hits": [
      {
        "_index": "home-depot-nlp",
        "_id": "1775",
        "_score": 8.623452,
        "_source": {
          "description": "Adorn your home with lots of Spring and summer style. The package includes 1-Piece HOME SWEET HOME Word Sign and 1-Piece Wooden Window Frame . Enhance your front door with them.",
          "brand": "Glitzhome"
        }
      }
    ]
  },
  "aggregations": {
    "response_codes": {
      "doc_count_error_upper_bound": 2,
      "sum_other_doc_count": 35,
      "buckets": [
        {
          "key": "BEHR PREMIUM PLUS",
          "doc_count": 12
        },
        {
          "key": "Unbranded",
          "doc_count": 10
        },
        {
          "key": "BEHR ULTRA",
          "doc_count": 9
        }
      ]
    }
  }
}

What is your host/environment?

OS: Linux running on EC2 installed with docker-compose

navneet1v commented 1 year ago

Thanks for creating the github issue. Aggregations is not be supported with Hybrid query as of now. This can be taken up as a feature request.

navneet1v commented 11 months ago

There is already an issue for supporting Aggregation with Hybrid Query Clause. Resolving the issue: https://github.com/opensearch-project/neural-search/issues/509