world-federation-of-advertisers / cardinality_estimation_evaluation_framework

Evaluation framework and methods for estimating cardinalities of groups of sets
Apache License 2.0
21 stars 9 forks source link

Undefined self attribute in LiquidLegions #128

Open riemanli opened 2 years ago

riemanli commented 2 years ago

https://github.com/world-federation-of-advertisers/cardinality_estimation_evaluation_framework/blob/732117efd307230156376e88f4d2ab3bb6b5df97/src/estimators/liquid_legions.py#L171

The function frequency_sample accesses the undefined attribute self.mask. In addition, I notice that the unit tests doesn't cover the usage of function frequency_sample .

c-dickens commented 2 years ago

I have also just noticed this issue. I think an incorrect copy has been made from cascading_legions.py where this attribute is defined. Notice that in that file, a comment is made that self.mask is used for the key aggregation.

A similar comment is made in liquid_legions.py referencing the same_key_aggregator function. For this reason, I think the dictionary self. unique stores the keys. By this I mean that self.unique operates in accordance with the key array from the LiquidLegions paper. Then replacing self.mask with self.unique should work as a drop in replacement.

However, I also noticed that there is no unit-testing coverage and I don't think the use of frequency capping (f_max from the paper) has been implemented in the current frequency estimation function, although it should be quite straightforward to perform this after the full histogram is returned.