pre-srfi / alist

Association list library
http://htmlpreview.github.io/?https://github.com/pre-srfi/alist/blob/master/srfi-alist.html
0 stars 0 forks source link

alists/xlists #1

Open mnieper opened 4 years ago

mnieper commented 4 years ago

Preliminary remark: I am starting to use the issue tracker to track things that will have to be decided in one way or the other. In case you think another method is more helpful, just speak up (or open another issue 😃).

Independent of the question of whether we consider alists and xlists (alists that store multiple values in a list in the cdr) being different data types or not (I do), at several places, we need two procedures, one for alists and one for xlists, so we need to establish some regular naming scheme. In the following, I continue to use the name xlist but this is just a placeholder for a better name.

An example is given by the ref procedure:

(define alist-ref
  (case-lambda
    ((alist = key) (alist-ref alist = key (lambda () (error "alist-ref: key not found" key)) values)
    ((alist = key failure) (alist-ref alist = key failure values))
    ((alist = key failure success)
     (cond ((assoc key alist =) => (lambda (entry) (success (cdr entry)))) (else (failure))))))

This would not work for xlists where it has to be coded as follows (and is mv-aware):

(define xlist-ref
  (case-lambda
    ((alist = key) (xlist-ref alist = key (lambda () (error "alist-ref: key not found" key)) values)
    ((alist = key failure) (xlist-ref alist = key failure values))
    ((alist = key failure success)
     (cond ((assoc key alist =) => (lambda (entry) (apply success (cdr entry)))) (else (failure))))))

While I would stick to use the alist--prefix for the first version, we have quite some freedom for the second version. We can either use another prefix (like xlist) or we can use some other kind of decoration like a * or a suffix -values or the like.

lassik commented 4 years ago

I am starting to use the issue tracker to track things that will have to be decided in one way or the other.

Works for me.

we need two procedures, one for alists and one for xlists, so we need to establish some regular naming scheme. In the following, I continue to use the name xlist but this is just a placeholder for a better name.

An example is given by the ref procedure:

The only difference is the apply in xlist-ref, which means it returns the cdr list of the association pair as multiple values.

I never had this multiple-value API in mind - that may be why we've had trouble agreeing on the nature of an xlist :)

I've often used alists or hash-tables where the value is a growable list, i.e. each key has a different-length list. Returning multiple values makes sense for the case where the list represents a tuple (i.e. a record with a fixed number of nameless fields).

The tuple case may be useful as well, but Scheme has record and hash-table (and indeed alist :) types for storing records. Do you have particular examples in mind where you have used it?

mnieper commented 4 years ago

When possible, we should include the possiblity of multiple values because it is a very natural thing in Scheme (as opposed to Haskell or C, for example). This doesn't work with alists where the payload is directly in the cdr. But when the payload is wrapped in a list in the cdr, it is the most natural thing to do.

But even with a single value, there is a difference between the two versions. In the first version, we write (success (cdr entry)), in the second version, unnaturally restricted to a single value, it would be (success (cadr entry)).

Both uses occur in practice (dotted pairs vs two- or more element lists), so for a complete API, we have to support both. That's why we should look for consistent naming from the very beginning so that it becomes easy for the user to understand (or to swap one version with the other).

I agree that alists whose payload is a growable list are a third (and important) use case, and we should cover this as well. (Adding procedures that can be portably implemented to a SRFI is cheap.) This part of the API may partly overlap with the "xlist" (we really need a better name!) part of the API.

As for use cases: Yes, I have had plenty of them, for example in my Scheme syntax expander. You don't want to define a record for every kind of aggregation so multiple values come in very handy, especially when you have a pipeline of several procedures, which take more than one value, because when we call procedures, we usually do not aggregate them into records as well. The source code of the Chez Scheme compiler is also full of uses of multiple values.

lassik commented 4 years ago

"xlist" (we really need a better name!)

How about tuple-alist or values-alist?

lassik commented 4 years ago

There is no SRFI about tuples yet (SRFI 168 doesn't really count for this purpose).

lassik commented 4 years ago

But even with a single value, there is a difference between the two versions. In the first version, we write (success (cdr entry)), in the second version, unnaturally restricted to a single value, it would be (success (cadr entry)).

The generalization to multiple values is error-prone when you read the list from a data file and expect the format to supply only one value per key. If the data file has multiple values, the remaining values are silently discarded. We should offer a getter procedure that expects exactly one value. A getter that expects either no values or one value is also useful. These are basically the regexp repeaters.

lassik commented 4 years ago

I've found all of these cases useful in practice:

lassik commented 4 years ago

Calls like this are useful:

where thing is read from a file and is something like:

(thing
  (name "foo")
  (drink "tea")
  (drink "coffee"))

The API names and arguments are all subject to debate.

lassik commented 4 years ago

As for use cases: Yes, I have had plenty of them, for example in my Scheme syntax expander. You don't want to define a record for every kind of aggregation so multiple values come in very handy, especially when you have a pipeline of several procedures, which take more than one value, because when we call procedures, we usually do not aggregate them into records as well. The source code of the Chez Scheme compiler is also full of uses of multiple values.

These are very interesting!

mnieper commented 4 years ago

values-alist would be much better than tuple-alist because it is in line with the naming used in the report.

mnieper commented 4 years ago

But even with a single value, there is a difference between the two versions. In the first version, we write (success (cdr entry)), in the second version, unnaturally restricted to a single value, it would be (success (cadr entry)).

The generalization to multiple values is error-prone when you read the list from a data file and expect the format to supply only one value per key. If the data file has multiple values, the remaining values are silently discarded. We should offer a getter procedure that expects exactly one value. A getter that expects either no values or one value is also useful. These are basically the regexp repeaters.

I think I haven't understood. It is exactly the multiple values version that is not error-prone. Recall that the multiple values version is (apply success (cdr entry)). As long as you don't use a Scheme system that is silent on errors, (cdr entry) will have to contain as many objects as success has formal arguments.

So apply is Just the Right Thing!

And you could use it as follows, for example: (values-alist-ref alist = key failure list)

mnieper commented 4 years ago

I've found all of these cases useful in practice:

* Exactly one value, and it is optional

* Exactly one value, and it is required

* Zero or more values

* One or more values

Multiple values have to be consumed by a continuation (for example the success procedure or whatever other continuation is in place). And continuations (being procedures) can handle exactly the use cases through case-lambda or the more basic dotted argument list syntax.

It's nice how well things fit together in Scheme.

lassik commented 4 years ago

The error messages can be better:

> (get-one 'name (cdr thing) string?)
*** Error: expecting one name in ((drink "tea) ...) but none was found.

> (get-one 'name (cdr thing) string?)
*** Error: expecting one name in ((drink "tea) ...) but more than one was found.

This may be a different use case than the one you have in mind. If you wanted to get exactly one string value from a values-alist, how would you express that using values-alist-ref? The apply-based API is probably a good foundation, but some convenience procedures are in order.

lassik commented 4 years ago

Something like this:

(define (get-one key alist valid?)
  (values-alist-ref
   alist
   equal?
   key
   (lambda () (error "Not found:" key))
   (lambda (value)
     (unless (valid? value)
       (error "Not valid:" key value))
     value)))

Though in this case we'd be better off using RnRS assoc to get a better error message for the case where there is more than one value for the key.

lassik commented 4 years ago

(n-value-alist-ref n alist = key failure list) 😄

This could give a good error message for the case where the association doesn't have exactly n values. But maybe the API is getting out of hand with all the special cases.

mnieper commented 4 years ago

Something like this:

(define (get-one key alist valid?)
  (values-alist-ref
   alist
   equal?
   key
   (lambda () (error "Not found:" key))
   (lambda (value)
     (unless (valid? value)
       (error "Not valid:" key value))
     value)))

Though in this case we'd be better off using RnRS assoc to get a better error message for the case where there is more than one value for the key.

If you want to customize the error message, use:

(define (get-one key alist valid?)
  (values-alist-ref
   alist
   equal?
   key
   (lambda () (error "Not found:" key))
   (case-lambda
     ((value)
       (unless (valid? value)
         (error "Not valid:" key value))
       value)
     (_ (error "Invalid entry for:" key)))))
mnieper commented 4 years ago

(n-value-alist-ref n alist = key failure list) 😄

This could give a good error message for the case where the association doesn't have exactly n values. But maybe the API is getting out of hand with all the special cases.

What is this procedure supposed to do exactly?

lassik commented 4 years ago

If you want to customize the error message, use [case-lambda]

True - that would solve it.

(n-value-alist-ref n alist = key failure list)

What is this procedure supposed to do exactly?

If there are exactly n values associated with key, call the success procedure with those values as arguments. Else raise an exception with a clear error message.

mnieper commented 4 years ago

Am Do., 11. Juni 2020 um 15:28 Uhr schrieb Lassi Kortela notifications@github.com:

(n-value-alist-ref n alist = key failure list)

What is this procedure supposed to do exactly?

If there are exactly n values associated with key, call the success procedure with those values as arguments. Else raise an exception with a clear error message.

I see. And I agree that it would be overkill to add such specialized procedures to the API as well. If you really need this, you should supply a list procedure that takes exactly n arguments.

mnieper commented 4 years ago

Am Do., 11. Juni 2020 um 15:31 Uhr schrieb Marc Nieper-Wißkirchen marc.nieper@gmail.com:

I see. And I agree that it would be overkill to add such specialized procedures to the API as well. If you really need this, you should supply a list procedure that takes exactly n arguments.

That's the nice thing about the failure/success protocol: It makes the whole thing nicely composable.

lassik commented 4 years ago

That's the nice thing about the failure/success protocol: It makes the whole thing nicely composable.

That's true. However, getting one value from a list is a common thing to do (basically this happens whenever one stores a set of "properties" of some sort in an S-expression). The n>1 cases may not be as common.

mnieper commented 4 years ago

That's true. However, getting one value from a list is a common thing to do (basically this happens whenever one stores a set of "properties" of some sort in an S-expression). The n>1 cases may not be as common.

If you drop the success argument, you will get an error as long as the continuation of values-alist-ref expects only one argument and is not variadic (which is the most common case). If it doesn't and you want to be pretty sure about the number of values, you can always wrap it in

(define (identity obj) obj)

or use some higher-order procedure like

(define (check pred?)
  (lambda arg* (unless (apply pred? arg*) (error "check: assertion failed" arg*)) (apply values arg*)))

One would use it like: (values-alist-ref alist = key failure (check string?)).

But such procedures belong to some other SRFI, maybe a SRFI about functional utility procedures.