Open Kixiron opened 3 years ago
Are you positive that threshold_total
subsumes consolidate
? Semantically, they do different things. threshold_total
makes sure that all weights in a relation are unit weights, whereas consolidate
makes sure that each value appears exactly once in the set of changes for a given timestamp. Are you saying that threshold_total
is guaranteed to produce consolidated output for each timestamp? It makes sense that the implementation works that way, but it does not follow from the mathematical definition of the operator.
That's exactly the kind of proof we should be able to make once we have a calculus.
Are you positive that
threshold_total
subsumesconsolidate
?
I am, and I extend that to all uses of reduce_core
. This behavior comes courtesy of it's output being an arrangement.
Semantically, they do different things.
threshold_total
makes sure that all weights in a relation are unit weights, whereasconsolidate
makes sure that each value appears exactly once in the set of changes for a given timestamp.
Yes, semantically. But both threshold_semigroup
(requiring TotalOrder
) and reduce_core
(coping with partially ordered timestamps) operate on batches. Batches, at least currently, span a closed interval of timestamps (while this gets a bit fuzzy for partially ordered timestamps, it applies to a natural extension based on using antichain frontiers). Also, when batches are build, updates are consolidated, as this greatly decreases downstream cost.
Are you saying that
threshold_total
is guaranteed to produce consolidated output for each timestamp? It makes sense that the implementation works that way, but it does not follow from the mathematical definition of the operator.
Yes, I am. And, as said, I extend that specific claim to all uses of threshold_semigroup
, and furthermore to all uses of reduce_core
.
That makes sense, thank you! This should be a trivial optimization to implement, I'll create a PR soon.
For relations that are both distinct and output they'll be aggregated twice, first at the distinct/threshold total and again at the consolidation. This can save non-trivial amounts of time (over 2 seconds for the citations benchmark) as well as removing an entire arrangement. Arguably the consolidation could be removed entirely, but we'd probably need a better output update scheme before that becomes viable without thrashing maps.
This input program displays the behavior
CoCitation
will be aggregated twice, causing its runtime to nearly double as a result