janschultecom / akvokolekta

Akka streams extension
Apache License 2.0
13 stars 1 forks source link

deduplicate - code for bloom filter is not enough #2

Open kzimnicki opened 7 years ago

kzimnicki commented 7 years ago

Hi

I've checked your code using bloom filters: https://github.com/janschultecom/akvokolekta/blob/master/src/main/scala/com/janschulte/akvokolekta/impl/EnhancedSource.scala#L23

Is it working correctly ? You are using Bloom Filters which says:

If I understand this correctly then here: https://github.com/janschultecom/akvokolekta/blob/master/src/main/scala/com/janschulte/akvokolekta/impl/EnhancedSource.scala#L27 you should check some Set and return true or false if Set contains this element.

wmaroy commented 7 years ago

Hi @janschultecom

I also agree on this. A bloom filter would not be enough. Some other check should be done when the filter returns "possibly in the set". When not memory bound some set and if memory bound a disk access for example.