koka-lang / koka

Koka language compiler and interpreter
http://koka-lang.org
Other
3.16k stars 151 forks source link

Vector to string conversion adds an extra character #486

Closed Caid11 closed 1 month ago

Caid11 commented 2 months ago

I think there's an off-by-one error in kk_string_from_chars. When I convert a vector of chars to a string, a space is added to the end of the resulting string.

For example, this snippet:

fun main()
  val some-string = "hello"
  println("some-string = [" ++ some-string ++ "]")
  println("some-string.vector.string = [" ++ some-string.vector.string ++ "]")

Produces this output in Koka v3.1.1:

some-string = [hello]
some-string.vector.string = [hello ]

Looking at kk_string_from_chars, I see Koka adds 1 to the length of the vector's elements:

kk_string_t kk_string_from_chars(kk_vector_t v, kk_context_t* ctx) {
  kk_ssize_t n;
  kk_box_t* cs = kk_vector_buf_borrow(v, &n, ctx);
  kk_ssize_t len = 0;
  for (kk_ssize_t i = 0; i < n; i++) {
    len += kk_utf8_len(kk_char_unbox(cs[i], KK_BORROWED, ctx));
  }
  uint8_t* p;
  kk_string_t s = kk_unsafe_string_alloc_buf(len + 1, &p, ctx); // <= NOTE THE ADDED 1
  for (kk_ssize_t i = 0; i < n; i++) {
    kk_ssize_t count;
    kk_utf8_write(kk_char_unbox(cs[i], KK_BORROWED, ctx), p, &count);
    p += count;
  }
  kk_assert_internal(kk_string_buf_borrow(s, NULL, ctx) + n == p);
  kk_vector_drop(v, ctx);
  return s;
}

I suspect the intent is to add a terminating character, but perhaps Koka's string implementation adds that implicitly? If I remove the added 1, it seems to produce the correct output.

Here's a complete reproducer:

vec-str-bug-repro.kk:

extern import
  c file "inline.c"

extern custom-string(v : vector<char>) : string
  c  "kk_custom_string_from_chars"

fun main()
  val some-string = "hello"
  println("some-string = [" ++ some-string ++ "]")
  println("some-string.vector.string = [" ++ some-string.vector.string ++ "]")
  println("some-string.vector.custom-string = [" ++ some-string.vector.custom-string ++ "]")

inline.c:

kk_string_t kk_custom_string_from_chars(kk_vector_t v, kk_context_t* ctx) {
  kk_ssize_t n;
  kk_box_t* cs = kk_vector_buf_borrow(v, &n, ctx);
  kk_ssize_t len = 0;
  for (kk_ssize_t i = 0; i < n; i++) {
    len += kk_utf8_len(kk_char_unbox(cs[i], KK_BORROWED, ctx));
  }
  uint8_t* p;
  // kk_string_t s = kk_unsafe_string_alloc_buf(len + 1, &p, ctx);
  kk_string_t s = kk_unsafe_string_alloc_buf(len, &p, ctx);
  for (kk_ssize_t i = 0; i < n; i++) {
    kk_ssize_t count;
    kk_utf8_write(kk_char_unbox(cs[i], KK_BORROWED, ctx), p, &count);
    p += count;
  }
  kk_assert_internal(kk_string_buf_borrow(s, NULL, ctx) + n == p);
  kk_vector_drop(v, ctx);
  return s;
}

Resulting output:

some-string = [hello]
some-string.vector.string = [hello ]
some-string.vector.custom-string = [hello]
TimWhiting commented 2 months ago

Yes, it looks like bytes.c already handles the len+1 and terminating byte for all internal variants of strings. https://github.com/koka-lang/koka/blob/b1c558e91afbb39a2388f207df17c384a279cbc1/kklib/src/bytes.c#L18

I'll commit a quick fix on dev.