pelias / schema

elasticsearch schema files and tooling
MIT License
40 stars 74 forks source link

enable docvalues for source_id field #482

Open missinglink opened 2 years ago

missinglink commented 2 years ago

draft PR to test the effect of enabling docvalues for the source_id field. this is motivated by the discussion in https://github.com/pelias/api/pull/1608

it is hoped that this field can be used as an additional sorting criteria in order to make the ordering of results with the same _score value more deterministic, and therefore make testing more stable and predictable.

my concerns with this change are:

the source_id values are almost entirely unique and non-sequential, so I'd expect to see poor compression.

although.. these concerns are hopefully unwarranted, we might want to check those before merging this.

missinglink commented 2 years ago

Snapshot size comparison:

Screenshot 2022-03-17 at 12 35 15