world-federation-of-advertisers / cardinality_estimation_evaluation_framework

Evaluation framework and methods for estimating cardinalities of groups of sets
Apache License 2.0
22 stars 9 forks source link

Frequency scenario 3 #45

Closed matthewclegg closed 4 years ago

matthewclegg commented 4 years ago

This PR implements the third data generation scenario for frequency evaluation: publisher constant frequency.

The set_generatory.py file had grown to be rather large, so I have split the set generation classes into three files: a common set of base classes, classes for the cardinality estimation evaluation, and classes for frequency estimation evaluation.

In addition, I refactored the HomogeneousMultiSetGenerator so that it and the new PublisherConstantFrequencySetGenerator share a common base class.