tbanel / orgtbljoin

Enrich an Org-table with a reference table
GNU General Public License v3.0
32 stars 4 forks source link

-- mode: org; --

+TITLE: Join Two Org Mode Tables

+OPTIONS: ^:{} author:Thierry Banel toc:nil

One table (the master table) is grown by selectively appending columns of another table (the reference table).

Here is a list of products for a cooking recipe.

+BEGIN_EXAMPLE

| type | quty | |----------+------| | onion | 70 | | tomato | 120 | | eggplant | 300 | | tofu | 100 |

+END_EXAMPLE

We want to complete it with nutritional facts: quantities of fiber, sugar, proteins, and carbohydrates. For this purpose, we have a long reference table of standard products. (This table has been freely borrowed from Nut-Nutrition, http://nut.sourceforge.net/, by Jim Jozwiak).

+BEGIN_EXAMPLE

+tblname: nut

| type | Fiber | Sugar | Protein | Carb | |----------+-------+-------+---------+------| | eggplant | 2.5 | 3.2 | 0.8 | 8.6 | | tomato | 0.6 | 2.1 | 0.8 | 3.4 | | onion | 1.3 | 4.4 | 1.3 | 9.0 | | egg | 0 | 18.3 | 31.9 | 18.3 | | rice | 0.2 | 0 | 1.5 | 16.0 | | bread | 0.7 | 0.7 | 3.3 | 16.0 | | orange | 3.1 | 11.9 | 1.3 | 17.6 | | banana | 2.1 | 9.9 | 0.9 | 18.5 | | tofu | 0.7 | 0.5 | 6.6 | 1.4 | | nut | 2.6 | 1.3 | 4.9 | 7.2 | | corn | 4.7 | 1.8 | 2.8 | 21.3 |

+END_EXAMPLE

Let us put the cursor on the =type= column of the recipe table, and type =C-c C-x j= or =M-x orgtbl-join=.

A few questions are asked. Then the recipe gets new columns appended with the needed nutrition facts:

+BEGIN_EXAMPLE

| type | quty | Fiber | Sugar | Protein | Carb | |----------+------+-------+-------+---------+------| | onion | 70 | 1.3 | 4.4 | 1.3 | 9.0 | | tomato | 120 | 0.6 | 2.1 | 0.8 | 3.4 | | eggplant | 300 | 2.5 | 3.2 | 0.8 | 8.6 | | tofu | 100 | 0.7 | 0.5 | 6.6 | 1.4 |

+END_EXAMPLE

If you are familiar with SQL, you would get a similar result with the a /join/ (actually a /left outer join/ by default, but that can be configured with the =:full= parameter).

+begin_src sql

select * from recipe, nut where recipe.type = nut.type;

+end_src

+begin_src sql

select * from recipe, nut left outer join nut on recipe.type = nut.type;

+end_src

Three modes are available: /in-place/, /push/, /pull/.

** In /in-place/ mode

The master table is changed (in-place) by appending columns from the reference table.

Invoke it with the =M-x orgtbl-join= command, also bound to =C-c C-x j=. The cursor must be positioned on the column used to perform the join.

** In /push/ mode

The master table drives the creation of derived tables. Specify the wanted result in =#+ORGTBL: SEND= directives (as many as desired):

+BEGIN_EXAMPLE

+ORGTBL: SEND enriched orgtbl-to-joined-table :ref-table nut :mas-column type :ref-column type

| type | quty | |----------+------| | onion | 70 | | tomato | 120 | | eggplant | 300 | | tofu | 100 |

+END_EXAMPLE

The receiving blocks must be created somewhere else in the same file:

+BEGIN_EXAMPLE

+BEGIN RECEIVE ORGTBL enriched

+END RECEIVE ORGTBL enriched

+END_EXAMPLE

Typing =C-c C-c= with the cursor on the first pipe of the master table refreshes all derived tables.

** In /pull/ mode

So-called "dynamic blocks" may also be used. The resulting table knows how to build itself. Example:

A master table is unaware that it will be enriched in a joined table:

+BEGIN_EXAMPLE

+TBLNAME: recipe

| type | quty | |----------+------| | onion | 70 | | tomato | 120 | | eggplant | 300 | | tofu | 100 |

+END_EXAMPLE

Create somewhere else a /dynamic block/ which carries the specification of the join:

+BEGIN_EXAMPLE

+BEGIN: join :mas-table recipe :mas-column type :ref-table nut :ref-column type

| type | quty | Fiber | Sugar | Protein | Carb | |----------+------+-------+-------+---------+------| | onion | 70 | 1.3 | 4.4 | 1.3 | 9.0 | | tomato | 120 | 0.6 | 2.1 | 0.8 | 3.4 | | eggplant | 300 | 2.5 | 3.2 | 0.8 | 8.6 | | tofu | 100 | 0.7 | 0.5 | 6.6 | 1.4 |

+END:

+END_EXAMPLE

Typing =C-c C-c= with the cursor on the =#+BEGIN:= line refreshes the table.

** As a rule of thumb

For quick and once-only processing, use /in-place/ mode.

Use /pull/ or /push/ modes for reproducible work. The /pull/ mode might be easier to use than the /push/, because there is a wizard bound to =C-c C-x x= (see below). Other than that, the two modes use the same underlying engine, so using one or the other is just a matter or convenience.

The reference table may contain several matching rows for the same value in the master table. In this case, as many rows are created in the joined table. Therefore, the resulting table may be longer than the master table. Example, if the reference table contains three rows for "eggplants":

+BEGIN_EXAMPLE

+tblname: nut

| type | Cooking | Fiber | Sugar | Protein | Carb | |----------+---------+-------+-------+---------+------| | ... | ... | ... | ... | ... | ... | | eggplant | boiled | 2.5 | 3.2 | 0.8 | 8.6 | | eggplant | pickled | 3.4 | 6.5 | 1.2 | 13.3 | | eggplant | raw | 2.8 | 1.9 | 0.8 | 4.7 | | ... | ... | ... | ... | ... | ... |

+END_EXAMPLE

Then the resulting table will have those three rows appended:

+BEGIN_EXAMPLE

| type | quty | type | Cooking | Fiber | Sugar | Protein | Carb | |----------+------+----------+---------+-------+-------+---------+------| | eggplant | 300 | eggplant | boiled | 2.5 | 3.2 | 0.8 | 8.6 | | eggplant | 300 | eggplant | pickled | 3.4 | 6.5 | 1.2 | 13.3 | | eggplant | 300 | eggplant | raw | 2.8 | 1.9 | 0.8 | 4.7 |

+END_EXAMPLE

If you are familiar with SQL, this behavior is reminiscent of the /left outer join/.

Duplicate entries may happen both in the master and the reference table. The joined table will have all combinations. So for instance if there are 2 =eggplant= rows in the master table, and 3 =eggplant= rows in the reference table, then the joined table will get 6 =eggplant= rows.

By default, all columns from both the master table and the reference table are output (except the joining column, which is output only once).

This can be customized with the =:cols= parameter. Give it the list of desired columns, in the order they should be output.

Columns may be specified by their name (if they have one) or by a dollar form. Thus, =$3= means the third column (numbering begins with 1).

By default, the first example give all columns (except =type= which appears only once):

+BEGIN_EXAMPLE

+BEGIN: join :mas-table recipe :mas-column type :ref-table nut :ref-column type

| type | quty | Fiber | Sugar | Protein | Carb | |----------+------+-------+-------+---------+------| | onion | 70 | 1.3 | 4.4 | 1.3 | 9.0 | | tomato | 120 | 0.6 | 2.1 | 0.8 | 3.4 | | eggplant | 300 | 2.5 | 3.2 | 0.8 | 8.6 | | tofu | 100 | 0.7 | 0.5 | 6.6 | 1.4 |

+END:

+END_EXAMPLE

If we want only =quty= and =Protein=, we specify it like that:

+BEGIN_EXAMPLE

+BEGIN: join :cols (quty Protein) :mas-table recipe :mas-column type :ref-table nut :ref-column type

| quty | Protein | |------+---------| | 70 | 1.3 | | 120 | 0.8 | | 300 | 0.8 | | 100 | 6.6 |

+END:

+END_EXAMPLE

Or like that:

+BEGIN_EXAMPLE

+BEGIN: join :cols "quty Protein" :mas-table recipe :mas-column type :ref-table nut :ref-column type

| quty | Protein | |------+---------| | 70 | 1.3 | | 120 | 0.8 | | 300 | 0.8 | | 100 | 6.6 |

+END:

+END_EXAMPLE

It may happen that no row in the reference table matches a value in the master table. By default, in this case, the master row is kept, with empty cells added to it. Information from the master table is not lost. If, for example, a line in the recipe refers to an unknown "amaranth" product (a cereal known by the ancient Incas), then the resulting table will still contain the =amaranth= row, with empty nutritional facts.

+BEGIN_EXAMPLE

| type | quty | type | Fiber | Sugar | Protein | Carb | |----------+------+----------+-------+-------+---------+------| | onion | 70 | onion | 1.3 | 4.4 | 1.3 | 9.0 | | tomato | 120 | tomato | 0.6 | 2.1 | 0.8 | 3.4 | | eggplant | 300 | eggplant | 2.5 | 3.2 | 0.8 | 8.6 | | tofu | 100 | tofu | 0.7 | 0.5 | 6.6 | 1.4 | | amaranth | 120 | | | | | |

+END_EXAMPLE

This behavior is controlled by the =:full= parameter:

The use cases may be as follow:

Sometimes an input table may be unaligned or malformed, with incomplete rows, like those ones:

+BEGIN_EXAMPLE

| type | Fiber | Sugar | | Carb | |----------+-------+-------+------+------| | eggplant | 2.5 | 3.2 | 0.8 | 8.6 | | tomato | 0.6 | 2.1 | 0.8 | 3.4 | | onion | 1.3 | 4.4 | 1.3 | 9.0 | | egg | 0 | 18.3 | 31.9 | 18.3 | | rice | 0.2 | 0 | 1.5 | 16.0 | | tofu | 0.7 | nut | 2.6 | 1.3 | 4.9 | 7.2 |

| type | quty | |----------+------| | onion | 70 | | tomato | | eggplant | 300 | | tofu | 100 |

+END_EXAMPLE

Missing cells are handled as though they were empty.

The master and the reference tables may or may not have a header. When there is a header, it may extend over several lines. A header ends with an horizontal line.

The orgtbl-join system tries to preserve as much of the master table as possible. Therefore, if the master table has a header, the joined table will have it verbatim, over as many lines as needed.

The reference table header (if any), will fill-in the header (if any) of the resulting table. But if there is no room in the resulting table header, the reference table header lines will be ignored, partly of fully.

Header are useful to refer to columns. If there is no header, then columns must be referred with =$= names: =$1= is the name of the first column, =$2= is the name of the second column, and so on. This is pretty much the same as in the Org Mode spreadsheet.

You may install key-bindings and menu entries by adding the following line to your =.emacs= file: : (orgtbl-join-setup-keybindings)

This will give the following bindings:

The /in-place mode/ is run through a small wizard which asks questions, with completion available.

A wizard is available for the /pull/ mode.

For all questions, completion is available. In the later case, you should answer =join= when asked for the kind of block.

Note: there many kinds of dynamic blocks can be inserted besides =join=.

Additional columns can be specified for the resulting table. With the previous example, we added a 7th column multiplying columns 2 and 3. This results in a line beginning with =#+TBLFM:= below the table, as usual in Org spreadsheet. This line will survive re-computations.

Moreover, we added a spreadsheet formula with a =:formula= parameter. This will fill-in the 7th column header. It is translated into a usual =#+TBLFM:= spreadsheet line.

+BEGIN_EXAMPLE

+BEGIN: join :mas-table recipe :mas-column type :ref-table nut :ref-column type :formula "@1$7=totfiber"

+name: richer

| type | quty | Fiber | Sugar | Protein | Carb | totfiber | |----------+------+-------+-------+---------+------+----------| | onion | 70 | 1.3 | 4.4 | 1.3 | 9.0 | 91. | | tomato | 120 | 0.6 | 2.1 | 0.8 | 3.4 | 72. | | eggplant | 300 | 2.5 | 3.2 | 0.8 | 8.6 | 750. | | tofu | 100 | 0.7 | 0.5 | 6.6 | 1.4 | 70. |

+TBLFM: $7=$2*$3::@1$7=totfiber

+END:

+END_EXAMPLE

The joined table can be post-processed with the =:post= parameter. It accepts a Lisp =lambda=, a Lisp function, or a Babel block.

The processing receives the joined table as parameter in the form of a Lisp expression. It can process it in any way it wants, provided it returns a valid Lisp table.

A Lisp table is a list of rows. Each row is either a list of cells, or the special symbol =hline=.

In this example, a =lambda= expression adds a =hline= and a row for /ginger/.

+begin_example

+BEGIN: join ... :post (lambda (table) (append table '(hline (ginger na na na na))))

| product | quty | Carb | Fiber | Sugar | Protein | |-----------+--------+------+-------+-------+---------| | onion | 70 | 9.0 | 1.3 | 4.4 | 1.3 | | unknown | 999 | | tomatoe | 120 | 3.4 | 0.6 | 2.1 | 0.8 | |-----------+--------+------+-------+-------+---------| | ginger | 33 | na | na | na | na |

+END:

+end_example

The =lambda= can be moved to a =defun=. The function is then passed to the =:post= parameter:

: #+begin_src elisp : (defun my-function (table) : (append table : '(hline (ginger na na na na)))) : #+end_src

: ... :post my-function

The =:post= parameter can also refer to a Babel Block. Example:

: #+BEGIN: join ... :post "my-babel-block(tbl=this)" : ... : #+END:

: #+name: my-babel-block : #+begin_src elisp :var tbl="" : (append tbl : '(hline (ginger na na na na))) : #+end_src

In the above example we gave a name to the resulting joined table: =#+name: richer=. Doing so the joined table may become an input for a further computation, for example in a babel block.

The name will survive re-computations. This happens only in /pull mode/.

Note that the =#+name: richer= line could appear above the =#+BEGIN:= line. But sometimes this is not taken into account by further babels blocks.

Emacs package on Melpa: add the following lines to your =.emacs= file, and reload it.

+BEGIN_EXAMPLE

(add-to-list 'package-archives '("melpa" . "http://melpa.org/packages/") t) (package-initialize)

+END_EXAMPLE

You may also customize this variable:

+BEGIN_EXAMPLE

M-x customize-variable package-archives

+END_EXAMPLE

Then browse the list of available packages and install =orgtbl-join=

+BEGIN_EXAMPLE

M-x package-list-packages

+END_EXAMPLE

Alternatively, you can download the lisp files, and load them:

+BEGIN_EXAMPLE

(load-file "orgtbl-join.el")

+END_EXAMPLE

You may want to add an entry in the ~Table~ menu, ~Column~ sub-menu. You may also want to call ~orgtbl-join~ with ~C-c j~. One way to do so is to use ~use-package~ in your ~.emacs~ init file:

+begin_src elisp

(use-package orgtbl-join :after (org) :bind ("C-c j" . orgtbl-join) :init (easy-menu-add-item org-tbl-menu '("Column") ["Join with another table" orgtbl-join (org-at-table-p)]))

+end_src

Note: there used to be a ~orgtbl-join-setup-keybindings~ function to do just what the above ~use-package~ does. In this way, key binding and menu binding are no longer hard-coded in the package.

Comments, enhancements, etc. welcome.

Author

Contributors

orgtbl-join is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.