vigna / Sux4J

Sux4J is an effort to bring succinct data structures to Java.
GNU Lesser General Public License v2.1
154 stars 22 forks source link

ZFastTrie.pred seems broken #10

Closed jto closed 3 years ago

jto commented 3 years ago

Here are a couple of examples using ZFastTrie.pred and ZFastTrie.succ on Strings and Long. In both cases succ seems to have the expected behavior but pred results are unexpected.

Am I doing something wrong ? I'm using version 5.2.3

Example is a Scala worksheet but I believe the code is easy enough to understand. The comment below each line is the returned value.

val ints = 1.to(50, 5).map(_.toLong: java.lang.Long).toIterable
// ints: Iterable[java.lang.Long] = Vector(1L, 6L, 11L, 16L, 21L, 26L, 31L, 36L, 41L, 46L)

val predsInt = new ZFastTrie(
  ints.asJava,
  bits.TransformationStrategies.fixedLong()
)
// predsInt: ZFastTrie[java.lang.Long] = {1, 6, 11, 16, 21, 26, 31, 36, 41, 46}

predsInt.pred(21L)
// res7: java.lang.Long = 21L
predsInt.pred(22L)
// res8: java.lang.Long = 16L
predsInt.pred(23L)
// res9: java.lang.Long = 16L
predsInt.pred(24L)
// res10: java.lang.Long = 26L

predsInt.succ(0L)
// res11: java.lang.Long = 1L
predsInt.succ(3L)
// res12: java.lang.Long = 6L
predsInt.succ(22L)
// res13: java.lang.Long = 26L
predsInt.succ(23L)
// res14: java.lang.Long = 26L
predsInt.succ(24L)
// res15: java.lang.Long = 26L
predsInt.succ(99L)
// res16: java.lang.Long = null

val nums = List("a", "e", "i", "o", "u")
// nums: List[String] = List("a", "e", "i", "o", "u")
val predsNums = new ZFastTrie(nums.asJava, bits.TransformationStrategies.prefixFreeIso[String])
// predsNums: ZFastTrie[String] = {a, e, i, o, u}

predsNums.pred("b")
// res17: String = null
predsNums.pred("bb")
// res18: String = null
predsNums.pred("d")
// res19: String = "e"
predsNums.pred("e")
// res20: String = "e"
predsNums.pred("f")
// res21: String = "a"
predsNums.pred("j")
// res22: String = "e"
predsNums.pred("aa")
// res23: String = null

predsNums.succ("b")
// res24: String = "e"
predsNums.succ("bb")
// res25: String = "e"
predsNums.succ("d")
// res26: String = "e"
predsNums.succ("e")
// res27: String = "e"
predsNums.succ("f")
// res28: String = "i"
predsNums.succ("j")
// res29: String = "o"
predsNums.succ("aa")
// res30: String = "e"

List("bb", "b", "a", "aa", "d", "e").sorted
// res31: List[String] = List("a", "aa", "b", "bb", "d", "e")
vigna commented 3 years ago

Er... I must admit my memory is a bit confused about what happened, but I believe those were stub methods that had to be tested and completed; indeed, the unit test never exercise them, and they are entirely undocumented.

Somehow I forgot about them: I'll have a look and get back to you.

jto commented 3 years ago

Hi @vigna, Thank you for your quick reply (and for this library ofc)!

That explains it then :) In case anyone runs into the same issue, iterator(key).previous() can be used as a workaround to achieved the same result. I did not look into the performance though.

vigna commented 3 years ago

OK, in the end it was just a sign (pred() was using the same sign test of succ(), while it should have been reversed).

I have implemented the correct behavior and added tests. Now there is also full documentation. I modified the method names (it's better anyway to have a compile error if anybody was using them): you can use the theory-style successor(), strictSuccessor(), etc., methods or the NavigableSet-style ceiling(), floor(), etc. methods (but NavigableSet is not yet implemented).

Please let me know if this works for you!

jto commented 3 years ago

Sounds great thank you 👍 Closing the issue :)

vigna commented 3 years ago

BTW, I just realized ZFastTrie won't work with negative integers. That's just because the fixedLong() strategy is missing a flip of the first bit to be truly lexicographical. I'll fix this ASAP in the DSI Utilities.

vigna commented 3 years ago

A final note: we have C implementations for strings and 64-bit integers in case you're interested. They're much faster as bit rolling is not one of Java's strengths...

jto commented 3 years ago

Hey @vigna. Thanks for the heads up! For now I'm just fiddling with Sux4j and the performance have been good enough. The C implementation might come handy if I need to scale things up a little :)