smlnj / legacy

This project is the old version of Standard ML of New Jersey that continues to support older systems (e.g., 32-bit machines).
BSD 3-Clause "New" or "Revised" License
25 stars 10 forks source link

`HashSetFn`'s calculation of `maxSize` can cause `Overflow` during functor instantiation #279

Closed Skyb0rg007 closed 1 year ago

Skyb0rg007 commented 1 year ago

Version

110.99.3 (Latest)

Operating System

OS Version

No response

Processor

System Component

SML/NJ Library

Severity

Major

Description

The HashSetFn functor (hash-set-fn.sml) calculates maxSize as the largest power of 2 that is still less than Array.maxLen. However, the calculation does not properly handle systems in which Int.maxInt = SOME Array.maxLen, causing an exception to be raised during functor instantiation.

Note: This bug does not occur on the SML/NJ compiler, since Array.maxLen is much smaller than Int.maxInt. The issue is reproducible for MLton (latest version: 20230523-gd082c4a36), since there SOME Array.maxLen = Int.maxInt.

Transcript

(* test.sml *)
structure X = HashSetFn(
   struct
      type hash_key = int
      val hashVal = Word.fromInt
      val sameKey: int * int -> bool = op =
   end)
val () = print "Okay!\n"

(* test.mlb *)
$(SML_LIB)/basis/basis.mlb
$(SML_LIB)/smlnj-lib/Util/smlnj-lib.mlb
test.sml

$ mlton -output a.out test.mlb
$ ./a.out
unhandled exception: Overflow

$ # The `-default-type intinf` flag causes Int.int to have infinite precision, but keeps Array.maxLen the same
$ mlton -output a.out -default-type intinf test.mlb
$ ./a.out
Okay!

Expected Behavior

Instantiating the HashSetFn should never raise Overflow.

Steps to Reproduce

Just run the code from the transcript.

Additional Information

This can be fixed by modifying the code at line 37 in hash-set-fn.sml:

(* old *)   
val maxSize = let
     fun f i = let
             val i' = i+i
             in
               if i' < Array.maxLen then f i' else i
             end
     in
       f 0x10000
     end

(* new*)   
val maxSize = let
     fun f i = let
             val i' = i+i
             in
               if i' < Array.maxLen then f i' else i
             end
             handle Overflow => i
     in
       f 0x10000
     end

Email address

ssoss AT uchicago DOT edu

MatthewFluet commented 1 year ago

MLton patches the SML/NJ Library that it ships to avoid a similar problem in structure HashTableRep (see https://github.com/MLton/mlton/blob/b1f1f0f0916d28c0d183fba85549d5bf96f1fa41/lib/smlnj-lib/smlnj-lib.patch#L4498-L4510).

Skyb0rg007 commented 1 year ago

I'll resubmit this report there. However I do still believe this should be addressed in the SML/NJ repo, if for nothing but future-proofing against a possible Array.maxLen increase or Int.maxInt decrease.

JohnReppy commented 1 year ago

I've pushed a fix that computes floor(log2(Array.maxSize)) to use as the maxSize limit for hash tables.

MatthewFluet commented 1 year ago

This computation of floor(log2(Array.maxSize)) is not robust when structure Word = Word32 and structure Int = IntInf; in general, if the default word size is smaller than the default integer size.

$ cat z.sml 
val maxSize = let
      fun lp (0w0, k) = Word.toIntX(Word.<<(0w1, k-0w1))
        | lp (w, k) = lp (Word.>>(w, 0w1), k+0w1)
      in
        lp (Word.fromInt Array.maxLen, 0w0)
      end

val _ = print (concat ["maxSize: ", Int.toString maxSize, "\n"])
$ mlton -default-type intinf z.sml 
$ ./z 
maxSize: ~2147483648
JohnReppy commented 1 year ago

Rewrote the code to just use the Int module, so it should be portable now.

MatthewFluet commented 1 year ago

The max-hash-table-size.sml file was not added in the commit.

JohnReppy commented 1 year ago

I missed it in the initial commit, but I added it about 4 hours ago.