schemedoc / cookbook

New Scheme Cookbook
https://cookbook.scheme.org
29 stars 3 forks source link

Accidentally quadratic #68

Open mnieper opened 1 year ago

mnieper commented 1 year ago

The proposed solution for string-split uses string-ref and substring. Both are allowed to be O(n) where n is the length of the string. Thus the solution can be accidentally quadratic on some Schemes. It should not be promoted but rewritten.

https://github.com/schemedoc/cookbook/blob/76456bca59da1f25fefb21d8bb11a510d7da4138/recipes/split-string.md?plain=1#L16

jcubic commented 1 year ago

You can add a warning to the file about the performance on some schemes.

lassik commented 1 year ago

Note that the cookbook permits (and maybe even encourages) multiple solutions with different tradeoffs for the same problem.

For many (perhaps most) practical uses cases, the best approach is the one with the simplest code.

For string split, the only portable alternatives I can think of are:

Both involve messier code, and the performance on small strings on most implementations may be worse.

mnieper commented 1 year ago

I guess that the Cookbook is aimed chiefly at beginners or casual users of Scheme. People will copy-and-paste code and not read the fine print (especially when it is a so-called AI copying the code). This is why I opened this issue and why I think it is very important to think twice about the code offered. Scheme has only a few places like this Cookbook, so I think it is important for the Cookbook to get it right.

The best approach is, in fact, string ports, both for scanning and for constructing the strings. The code would not be messier in any way but very straightforward.

If you are worrying about poor performance for small strings then you can either ignore it because the algorithm is fast for small strings in any case or you can add a small dispatch at the top of the procedure so that really small strings are handled differently.

"Simple" code that would not be accidentally quadratic would convert the string to a list and then use list utilities.

lassik commented 1 year ago

Well, can you write down either or both of those solutions? If the present solution has no advantages compared to them, we can remove it.

mnieper commented 1 year ago

Something like this:

(library (string-split)
  (export string-split)
  (import (rnrs))

  (define string-split
    (lambda (delim? s)
      (define who 'string-split)
      (unless (procedure? delim?)
        (assertion-violation who "invalid delim? argument" delim?))
      (unless (string? s)
        (assertion-violation who "invalid string argument" s))
      (let ([p (open-string-input-port s)])
        (define peek-char (lambda () (lookahead-char p)))
        (define read-char (lambda () (get-char p)))
        (define skip-delimiter!
          (lambda ()
            (let ([c (peek-char)])
              (when (and (not (eof-object? c))
                         (delim? c))
                (read-char)
                (skip-delimiter!)))))
        (let f ()
          (skip-delimiter!)
          (let ([c (read-char)])
            (if (eof-object? c)
                '()
                (let-values ([(q extract) (open-string-output-port)])
                  (put-char q c)
                  (let g ()
                    (let ([c (read-char)])
                      (cond
                       [(eof-object? c)
                        (list (extract))]
                       [(delim? c)
                        (cons (extract) (f))]
                       [else
                        (put-char q c)
                        (g)])))))))))))

This is R6RS code, but it can be easily translated to R7RS. Note that the original solution using string-ref is less a problem for R6RS because R6RS should implement string-ref in O(1).