sjyk / sampleclean-async

http://sampleclean.org
Apache License 2.0
92 stars 27 forks source link

Possible bugs in active learning #40

Closed jnwang closed 8 years ago

jnwang commented 9 years ago

Random split may have some issues. https://github.com/sjyk/sampleclean-async/blob/master/src/main/scala/sampleclean/activeml/ActiveLearningAlgorithm.scala#L111

Please test it for record-level deduplication with the input argument of ActiveLearningParameters, budget = 60, batchSize = 10, and bootstrapSize = 10.