mediacloud / story-indexer

The core pipeline used to ingest online news stories in the Media Cloud archive.
https://mediacloud.org
Apache License 2.0
2 stars 5 forks source link

Use _reindex api to generate a small test version of our index #340

Open pgulley opened 1 month ago

pgulley commented 1 month ago

While we're testing out different ways our index structure may be improved, we will want the ability to dry-run on real data. We should:

  1. Run through the steps of generating a test index which contains some random sample of our real data, using the _reindex api- either via kibana, or directly via curl
  2. Document that process here
  3. Produce a script or other utility for replicating that process with various index mapping choices.