pulibrary / dpul-collections

An inspiring environment for global communities to engage with diverse digital collections
1 stars 0 forks source link

Measure time to hydrate, transform, index #115

Open hackartisan opened 2 weeks ago

hackartisan commented 2 weeks ago

We have requirements on how fast we have to be able to fully perform each of these steps. We need to know whether we're meeting those requirements.

Other things to think about:

Acceptance Criteria

tpendragon commented 4 days ago

Brainstorming a bit:

Some possibly useful metrics I can think of are:

  1. Time to Poll - this is how long it takes on a fresh start-up for the Hydrator to hit polling. You can't really do this for the other consumers, because you'd include the previous steps in the measure. You'd have to start the transformer step after hydrator is polling, likewise for the indexer once the transformer is polling.
    • challenges:
  2. Time to Process 1 Doc. Seems like we'd have to record what the doc is, and then notify when it came out of the indexing consumer. May help us know where to optimize when the time comes, 1 doc vs. the system, but probably not super relevant to this ticket.
  3. Throughput - records/s indexed while the hydrator's going. Measure time to poll for each producer.

If we measure records / second / stage we could re-write our metrics that we established in these terms.

Broadway measures throughput for us in some form. maybe we could leverage that somehow?

we'd want an average time built up over a large set. e.g.:

Questions: where to store these stats? Integrate into livedashboard?

Implementation idea: run a watcher process that uses telemetry events.