smlnj / legacy

This project is the old version of Standard ML of New Jersey that continues to support older systems (e.g., 32-bit machines).
BSD 3-Clause "New" or "Revised" License
25 stars 10 forks source link

Connecting `TextIO` and `BinIO` to `CharBuffer` and `Word8Buffer` #312

Open Skyb0rg007 opened 2 months ago

Skyb0rg007 commented 2 months ago

Description

The SML Basis library provides the IMPERATIVE_IO and STREAM_IO signatures as the basic interfaces for reading and writing streamed data. However there is a misconception that the outstream types represent IO-based output and producing something such as string in this way is impossible.

For example, the SML/NJ library PP provides the SimpleTextIODev structure, which implements a device for TextIO.outstream but additionally provides the CharBufferDev structure for CharBuffers. Due to how the library works, it seems like a user who would like to support both writing to files and to strings must functorize their code, even though inspection of these two structures show that there is no difference in their implementations (outside of changing the types). This was "addressed" with the addition of the TextPP structure, which explicitly provides an abstraction over both CharBuffer and TextIO, although TextIO is already an abstraction over CharBuffer!

The TextIO/BinIO interfaces are extremely generic, and out-of-the-box they do support outputting to a CharBuffer, avoiding the need for this extra work. I believe the only reason this is done is because there is not (currently) any helpers to access a TextIO.outstream from a CharBuffer.buf, or any helper methods for creating interesting TextIO.outstreams in the standard library (TextIO.openString is the only similar function of this kind).

Proposal

Making TextIO/BinIO the premiere output interfaces

The MONO_BUFFER signature should have the following additions:

signature MONO_BUFFER =
sig
  (* include OLD_MONO_BUFFER *)

  type writer

  val getWriter : buf -> writer
end

CharBuffer : MONO_BUFFER
                      where type elem = Char.char
                        and type vector = CharVector.vector
                        and type slice = CharVectorSlice.slice
                        and type array = CharArray.array
                        and type array_slice = CharArraySlice.slice
                        and type writer = TextPrimIO.writer

structure Word8Buffer : MONO_BUFFER
                      where type elem = Word8.word
                        and type vector = Word8Vector.vector
                        and type slice = Word8VectorSlice.slice
                        and type array = Word8Array.array
                        and type array_slice = Word8ArraySlice.slice
                        and type writer= BinPrimIO.writer

Sample Implementation

fun getWriter buf =
  let
    val closed = ref false
    fun assertOpen f= if !closed then raise IO.Io {exn=IO.ClosedStream, function=f, name="<CharBuffer.buf>"} else ()
    fun writeArr a = (assertOpen "writeArr"; CharBuffer.addArrSlice (buf, a); CharArraySlice.length a)
    fun writeVec v = (* ... *)
  in
    CharPrimIO.WR {
      name = "<CharBuffer.buf>",
      block = SOME (fn _ => ()),
      canOutput = SOME (fn _ => true),
      chunkSize = 4096, (* Not sure the best constant here *)
      close = fn _ => closed := true,
      (* Positions don't make sense here *)
      endPos = NONE, getPos = NONE, setPos = NONE, verifyPos = NONE,
      writeArr = SOME writeArr, writeVec = SOME writeVec, writeArrNB = SOME writeArrNB, writeVecNB = SOME writeVecNB
    }
  end

Sample Usage

val myPrintFunction : my_data * TextIO.outstream -> unit = (* ... *)

fun toString (x : my_data) =
  let
    val buf = CharBuffer.new 0
    val stream = TextIO.mkOutstream (TextIO.StreamIO.mkOutstream (CharBuffer.getWriter buf, IO.NO_BUF))
  in
    myPrintFunction (x, stream);
    CharBuffer.contents buf
  end

Possible extra additions

Making string outputs as simple as possible

As mentioned, abstracting over how one outputs data is a very useful idea. While one could always write a toString : t -> string function, this requires the entire representation to live in memory at once and doesn't begin writing the data until entirety of the data is processed. The following functions could also provide very useful in everyday tasks (and are provided in other languages, ex R7RS-small)

(* Create an output stream connected to an in-memory vector.
    Calling the function yields the currently written data at that time. *)
val openOutString : unit -> outstream * (unit -> vector)

(* Example implementation *)
fun openOutString () =
  let
    val buf = CharBuffer.new 0
  in
    (TextIO.mkOutstream (TextIO.StreamIO.mkOutstream (CharBuffer.getWriter buf, IO.NO_BUF)),
     fn () => CharBuffer.contents buf)
  end