microsoft / scitt-ccf-ledger

Supply Chain Integrity Transparency and Trust ledger application using Confidential Consortium Framework (CCF)
MIT License
35 stars 18 forks source link

review the need for entry_seqno_index #237

Open achamayou opened 5 days ago

achamayou commented 5 days ago

The current code maintains a entry_seqno_index with a ccf::indexing::strategies::SeqnosForValue_Bucketed<EntryTable> that's essentially a list of the seqnos at which there is a write to the entry table (i.e. CTS business transactions as opposed to CCF internal/governance).

This is exposed by the /entries/txIds endpoint, presumably to facilitate scanning through all the CTS ledger entries using the API, as opposed to the ledger files.

This may be a premature and ultimately harmful optimisation, because it trades off an ever-growing in-memory index for being able to skip deserialising what should amount to a fairly small amount of transactions normally (governance is rarely the major part of a ledger). Aside from the memory cost, this will also greatly increase first-response latency on new nodes, which currently won't respond usefully before they can build the index up to that point. An index-less historical query will come back much faster in that situation.

My sense is that it would be best to:

  1. Remove the index
  2. Convert /entries/txIds to a regular historical query, if it's needed at all
  3. Establish one or more benchmarks that we think /entries/txIds should meet, and decide what, if any optimisation is needed