mozilla / filter-cascade

A python filter cascade implementation
Mozilla Public License 2.0
7 stars 5 forks source link

InvertedLogicException shouldn't fire beyond the first depth #15

Closed jcjones closed 4 years ago

jcjones commented 4 years ago

In the staging environment, we raised InvertedLogicException inappropriately. At the first layer depth, getting the sizes of include to exclude backward is a major issue because the false positive rate is geared for the correct distribution. The error rate at subsequent depths is typically 0.5, so inversion is not a big deal.

In tests, this sort of inversion doesn't seem to happen, but with the real dataset it occurred at the third layer. We should have just carried on. This will need a point release.

filtercascade:Initializing the 1-depth layer. err=0.00556587383514237 include_len=883498
filtercascade:Processing false positives
filtercascade:Took 843891.579 ms to process layer 1 with bit count 10480800
filtercascade:Initializing the 2-depth layer. err=0.5 include_len=374923 size=593880
filtercascade:Processing false positives
filtercascade:Took 4180.414 ms to process layer 2 with bit count 593880
filtercascade:Initializing the 3-depth layer. err=0.5 include_len=412637 size=653616
filtercascade:Processing false positives

Traceback (most recent call last):
File "/app/create_filter_cascade/certs_to_crlite.py", line 381, in <module> main()
File "/app/create_filter_cascade/certs_to_crlite.py", line 354, in main nonrevoked_certs_len=known_nonrevoked_certs_len,
File "/app/create_filter_cascade/certs_to_crlite.py", line 130, in generateMLBF cascade.initialize(include=revoked_certs, exclude=nonrevoked_certs)
File "/usr/local/lib/python3.7/site-packages/filtercascade/__init__.py", line 307, in initialize depth=depth, exclude_count=exclude_count, include_len=include_len filtercascade.InvertedLogicException:
At Depth 3, exclude set (374923) was < include set (412637). If you reached this exception, then either your input data had an uncountable iterator, or your filter length was insufficient, and you need to increase it. The only way to fix this issue is via a code update where you either manually swap the input 'include' and 'exclude' parameters and set the invertedLogic boolean to True upon construction, or set a larger min_filter_length.