intuit / fuzzy-matcher

A Java library to determine probability of objects being similar.
Apache License 2.0
228 stars 69 forks source link

Exception thrown for dates prior to the epoch #28

Closed davideaves closed 4 years ago

davideaves commented 4 years ago

Hi, I'm encountering an exception when documents have a date element that is before or around the epoch (1970-01-01). This seems to be due to the TokenRange constructor converting dates to a numeric value based off Date.getTime() but not expecting negative values which can result in a range with lower > upper. I can dig further if helpful.

Exception seen:

Exception in thread "main" java.lang.IllegalArgumentException: fromKey > toKey
    at java.util.TreeMap$NavigableSubMap.<init>(TreeMap.java:1368)
    at java.util.TreeMap$AscendingSubMap.<init>(TreeMap.java:1855)
    at java.util.TreeMap.subMap(TreeMap.java:913)
    at java.util.TreeSet.subSet(TreeSet.java:325)
    at com.intuit.fuzzymatcher.component.TokenRepo$Repo.get(TokenRepo.java:80)
    at com.intuit.fuzzymatcher.component.TokenRepo.get(TokenRepo.java:36)
    at com.intuit.fuzzymatcher.component.ElementMatch.elementThresholdMatching(ElementMatch.java:35)
    at com.intuit.fuzzymatcher.component.ElementMatch.lambda$matchElement$1(ElementMatch.java:26)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
    at com.intuit.fuzzymatcher.component.ElementMatch.matchElement(ElementMatch.java:25)
    at com.intuit.fuzzymatcher.component.DocumentMatch.lambda$null$0(DocumentMatch.java:35)
    at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269)
    at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1556)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
    at com.intuit.fuzzymatcher.component.DocumentMatch.lambda$matchDocuments$1(DocumentMatch.java:36)
    at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:269)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:313)
    at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:743)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
    at com.intuit.fuzzymatcher.component.MatchService.applyMatchByDocId(MatchService.java:115)
    at com.intuit.fuzzymatcher.component.MatchService.applyMatchByDocId(MatchService.java:81)
    ...

Thanks, Dave

manishobhatia commented 4 years ago

Dave, Thanks for reporting the issue. Will take a look at it and suggest a fix by the next build

manishobhatia commented 4 years ago

Created a PR to fix this issue https://github.com/intuit/fuzzy-matcher/pull/29

manishobhatia commented 4 years ago

Release version 1.0.3 with this fix

davideaves commented 4 years ago

Thanks very much for the prompt response! I'll grab the new version and give it a test.