Open riemanli opened 2 years ago
I have also just noticed this issue. I think an incorrect copy has been made from cascading_legions.py
where this attribute is defined. Notice that in that file, a comment is made that self.mask
is used for the key aggregation.
A similar comment is made in liquid_legions.py
referencing the same_key_aggregator
function. For this reason, I think the dictionary self. unique
stores the keys. By this I mean that self.unique
operates in accordance with the key
array from the LiquidLegions paper. Then replacing self.mask
with self.unique
should work as a drop in replacement.
However, I also noticed that there is no unit-testing coverage and I don't think the use of frequency capping (f_max
from the paper) has been implemented in the current frequency estimation function, although it should be quite straightforward to perform this after the full histogram is returned.
https://github.com/world-federation-of-advertisers/cardinality_estimation_evaluation_framework/blob/732117efd307230156376e88f4d2ab3bb6b5df97/src/estimators/liquid_legions.py#L171
The function
frequency_sample
accesses the undefined attributeself.mask
. In addition, I notice that the unit tests doesn't cover the usage of functionfrequency_sample
.