massung / tabular-asa

A column-oriented, dataframe implementation for Racket.
MIT License
17 stars 4 forks source link

Improved docs for the join functions? #5

Closed jbclements closed 4 months ago

jbclements commented 4 months ago

I'm using the table-join/inner function, and it's doing more or less what I expected, but I'm very unclear why there is both an "on" and a "with" argument to these functions.

....

Okay, I think I figured it out, but it was very unclear; the docs say nothing about the distinction between the on and with arguments. Here's how I think I would rewrite these docs:

;; performs an inner join. The on argument specifies the columns ;; in the first table on which to join. These same names are used ;; for the join in the second table, unless alternate names are ;; provided using a #:with clause. If the resulting rows would have ;; multiple columns with the same name (this can occur if, for instance, both rows ;; have a row named "ID" which is not joined on), the column from the second ;; table is renamed.

Also, I was surprised not to get an error when the on and with lists are of different lengths. Instead, I just got an empty table. Perhaps this is a standard database-ism?

massung commented 4 months ago

Yes, you got it. Perhaps an example in the documentation would be more obvious?

; both tables have a column named 'id
(table-join/inner a b #:on '(id))

; joining on the 'id column from table a and the 'movie_id column from table b
(table-join/inner a b #:on '(id) #:with '(movie_id))

It should probably be an error if the #:on and #:with arguments are not of equal length. But the reason it doesn't fail (but also doesn't return anything) is that under-the-hood it's just cutting columns from the tables and comparing the resulting rows. For example:

(table-join/inner a b #:on '(name) #:with '(name age))

The above will result in per-row comparisons like (equal? '("Jim") '("Jim" 45)), which will obviously always fail.

Good catch. May be worth putting in an assertion. Will have to think about it. I generally like to look at situations like this as though I was making a front-end tool for users: do I want to have to try/catch around poor user input or should it just return nothing, which is correct, but might take someone a bit of time to figure out why and fix the problem?

jbclements commented 4 months ago

I agree with the idea of an example. I also think the text of the description should be changed, possibly as I suggested above (it's not clear to me whether you were suggesting an example in addition to the altered docs, or instead of the altered docs.

I agree that whether or not to signal an error is tricky. Either could work, it's a gray area.

massung commented 4 months ago

Documentation update in bd6436f.