mozilla / filter-cascade

A python filter cascade implementation
Mozilla Public License 2.0
7 stars 5 forks source link

filter isn't consistent after writing and reading from file #16

Closed eviljeff closed 4 years ago

eviljeff commented 4 years ago

The following testcase fails at the last line because some of the exclude items are now found in the filter after it's been written to file and read back again.

import math
import secrets
from tempfile import mkstemp

from filtercascade import FilterCascade
from filtercascade.fileformats import HashAlgorithm

def test_filter_cascade():
    _blocked = [
        ('guid1@', '1.0'), ('@guid2', '1.0'), ('@guid2', '1.1'),
        ('guid3@', '0.01b1')]
    blocked = [f'{guid}:{version}' for guid, version in _blocked]
    _not_blocked = [
        ('guid10@', '1.0'), ('@guid20', '1.0'), ('@guid20', '1.1'),
        ('guid30@', '0.01b1'), ('guid100@', '1.0'), ('@guid200', '1.0'),
        ('@guid200', '1.1'), ('guid300@', '0.01b1')]
    not_blocked = [f'{guid}:{version}' for guid, version in _not_blocked]

    salt = secrets.token_bytes(16)
    fprs = [len(blocked) / (math.sqrt(2) * len(not_blocked)), 0.5]

    cascade_out = FilterCascade.cascade_with_characteristics(
        capacity=int(len(blocked) * 1.1),
        error_rates=fprs,
        defaultHashAlg=HashAlgorithm.SHA256,
        salt=salt,
    )
    cascade_out.initialize(include=blocked, exclude=not_blocked)
    # verify
    cascade_out.verify(include=blocked, exclude=not_blocked)

    # write the filter
    _, filter_path = mkstemp()
    with open(filter_path, 'wb') as filter_file:
        cascade_out.tofile(filter_file)

    # read the filter
    with open(filter_path, 'rb') as filter_file:
        cascade_in = FilterCascade.from_buf(filter_file.read())
    # verify
    cascade_in.verify(include=blocked, exclude=not_blocked)
jcjones commented 4 years ago

Confirmed; added test case in https://github.com/jcjones/filter-cascade/commit/a65b0d921ca9a50709f61b3de19743a37b8e368e going into branch https://github.com/mozilla/filter-cascade/compare/master...jcjones:16-read_write_inconsistency -- thanks!

jcjones commented 4 years ago

The salt isn't being loaded correctly in cascade_in.

jcjones commented 4 years ago

There are two bugs here: neither cascade_with_characteristics nor from_buf propagate the salt to child filters.

As part of this, I'm going to mark cascade_with_characteristics deprecated and remove the need to supply an empty filter list to __init__ to clean up the API. There's no good reason to manually construct filters by default anymore.

eviljeff commented 4 years ago

There are two bugs here: neither cascade_with_characteristics nor from_buf propagate the salt to child filters.

Okay, I'll change the addons-server code to create a FilterCascade directly and skip the part of the test that re-reads the filter in the meantime (we only use from_buf in tests)