harsha-simhadri / big-ann-benchmarks

Framework for evaluating ANNS algorithms on billion scale datasets.
https://big-ann-benchmarks.com
MIT License
313 stars 103 forks source link

Fixed running install.py on a full track #249

Closed landrumb closed 7 months ago

landrumb commented 7 months ago

Currently, install.py does not build anything but the base dockerfile when you run it without specifying an algorithm, due to a bug on line 69.

The filter builtin actually returns an iterator over the results of a filtering of the input, not a collection containing them. Because iterators in python do not reset, the list comprehension using algos in line 72 of install.py has no elements after they're all consumed on the previous line, so dockerfiles is empty.

Demonstration of this idea:

>>> f = filter(lambda x: x % 2 == 0, list(range(50)))
>>> type(f)
<class 'filter'>
>>> x = [n for n in f]
>>> x
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48]
>>> y = [n for n in f]
>>> y
[]

zip(tags, dockerfiles) on line 94 has as many elements as its shortest argument, and therefore does nothing when you iterate over it in this case.

Building a list with the iterator created by filter solves this problem, because lists create a new iterator every time you try to iterate over them.

maumueller commented 7 months ago

Thanks @landrumb for the fix and the detailed description of the cause 💯