opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
8.89k stars 1.63k forks source link

[BUG] Incorrect docs.deleted count with Soft Delete Enabled #13725

Open monusingh-1 opened 2 weeks ago

monusingh-1 commented 2 weeks ago

Describe the bug

There's an inconsistency in the docs.deleted count in Opensearch indices when soft delete is enabled, leading to incorrect value.

Related component

Indexing

To Reproduce

Steps to reproduce

Case with Soft Delete Enabled [Incorrect behavior]:

  1. Create an index with soft delete enabled:
> PUT myindex1
{
  "settings":{
    "number_of_shards": 1,
    "number_of_replicas": 2,
    "index.soft_deletes.enabled" : true
  }
}
  1. Index 2 documents:
> POST /_bulk
{"index": {"_index": "myindex1", "_id": "1"}}
{"field1": "value1", "field2": "value1"}
{"index": {"_index": "myindex1", "_id": "2"}}
{"field1": "value3", "field2": "value2"}
  1. Delete 1 document and issue a refresh:
> DELETE /myindex1/_doc/1
> GET /_refresh
  1. Check the document stats:
> GET /_cat/indices?v
health status index                uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   myindex1             8BsMrKplRdW1yQEpUE6Jag   5   2          1            2     27.9kb          9.3kb

We can see that docs.deleted is 2 which is incorrect as we only deleted 1 document.

Check segemets

> GET _cat/segments/myindex1?v
index    shard prirep ip            segment generation docs.count docs.deleted  size size.memory committed searchable version compound
myindex1 3     p      x.x.x.x_0               0          1            0 4.1kb        1876 false     true       8.10.1  true
myindex1 4     p      x.x.x.x_0               0          0            1 5.4kb        2084 false     true       8.10.1  true
myindex1 4     p      x.x.x.x_1               1          0            1 2.9kb         852 false     true       8.10.1  true
  1. Issue a force merge:
>POST /myindex1/_forcemerge
  1. Check the document stats again:
> GET /_cat/indices?v
health status index                uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   myindex1             8BsMrKplRdW1yQEpUE6Jag   5   2          1            0     41.8kb         13.9kb

Only after a merge, the correct value for doc.deleted is updated.

Expected Behavior:

The docs.deleted count should accurately reflect the number of documents deleted in the index.

Actual Behavior:

The docs.deleted count appears to be doubled after deletion, only resolving after a force merge operation.

Case with Soft Delete Disabled:

  1. Create an index with soft delete disabled [Correct behaviour]:
> PUT myindex2
{
  "settings":{
    "number_of_shards": 1,
    "number_of_replicas": 2,
    "index.soft_deletes.enabled" : false
  }
}
  1. Index 2 documents:
> POST /_bulk
{"index": {"_index": "myindex2", "_id": "1"}}
{"field1": "value1", "field2": "value1"}
{"index": {"_index": "myindex2", "_id": "2"}}
{"field1": "value3", "field2": "value2"}
  1. Delete 1 document and issue a refresh:
> DELETE /myindex2/_doc/1
> GET /_refresh
  1. Check the document stats:
> GET /_cat/indices?v
health status index                uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   myindex2             hWdayX6oTr64bSPJGGQTFA   1   2          1            1     24.8kb          8.2kb

In this case, without a merge, the correct value of doc.deleted is reflected.

Check segments

> GET _cat/segments/myindex2?v
index    shard prirep ip            segment generation docs.count docs.deleted  size size.memory committed searchable version compound
myindex2 3     p      x.x.x.x_0               0          1            0 4.1kb        1876 true      true       8.10.1  true
myindex2 4     p      x.x.x.x_0               0          1            0 4.1kb           0 true      false      8.10.1  true
myindex2 4     p      x.x.x.x_1               1          0            1 2.9kb         852 false     true       8.10.1  true

Additional Information:

Elasticsearch version: OpenSearch-1.3, OpenSearch-2.1

Expected behavior

The docs.deleted count should accurately reflect the number of documents deleted in the index.

Additional Details

No response

sarthakaggarwal97 commented 2 weeks ago

Thanks @monusingh-1 for creating this issue. Can you also please add the output of _cat/segments?v. Would like to see doc count and deleted doc count for the segments of the index.