handsontable / hyperformula

HyperFormula is an open-source headless spreadsheet for business web apps. It comes with over 400 formulas, CRUD operations, undo-redo, clipboard support, and sorting.
https://hyperformula.handsontable.com/
Other
1.95k stars 108 forks source link

Named Expressions and Structured References #126

Open wojciechczerniak opened 4 years ago

wojciechczerniak commented 4 years ago

TL;DR;

Sometimes columns (and rows) have customized names. Using the A1 or R1C1 addressing in this context is very confusing for the end-user. We need a way to translate columns names to addresses.

obraz

(note: I stole the image from our friends. I hope they don't mind, they need this feature as well.)

Stages

Description

The syntax here is based on Open Document Format Formula named expressions[3][4] and Excel structured references[5]. It should be 100% compatible with both of them and provides an extension to provide header names if the named expression/structured reference is called in the context of the worksheet (Handsontable instance). This is amazing how those things fit in together nicely.

When a Named Expression is a table (or worksheet) then we can query it deeper with structured references by using square brackets: [ ].

This spec is extended #27 and will close #5 as well.

Scopes

Both Named Expression and Structured References define two scopes: Worksheet (Local) and Workbook (Global). Each scope can have only unique names but if a name is in both there is a rule of precedence, which both specs define the same: If two variables have the same name, then the local scope takes the precedence over the global one.

Properties

Types

We could then add some built-in aliases (or added by Handsontable) and close #5 issue.

Methods

options = { comment, visibility }

CRUD

When CRUD operations are performed they should update named expressions references. Adjust references to single cells, expand or collapse ranges and tables.

Copy&Paste

@swistak35 noted that we need to check how copy&paste of named expressions is handled by other apps. Destination worksheet names may not exist in the target worksheet. Sometimes they will fall back to the global scope or were created with global scope and are now shadowed by local scope named expression.

Batch operations

It should be possible to add/remove multiple named expressions without triggering recalculation each time we call addNamedExpression method.

Syntax

There are three parts: =TableName[[RowName][ColumnName]] with some small differences. This is an absolute referenced table cell that is relative to the current worksheet.

To have totally absolute cell reference the table name can be prefixed with the worksheet name: =Sheet1!TableName[[RowName][ColumnName]]. To reference worksheet top header just omit the TableName ie: =Sheet1![[RowName][ColumnName]]

If table name is omitted: =[[RowName][ColumnName]] engine will check if we're already in defined table. This is an absolute cell in a relative table. If we're not in a table range, this will reference the worksheet headers.

If row name is omitted: =TableName[ColumnName] or =[ColumnName] the row is relative to the current row.

If the row is not provided the query should return the whole column. Same as: A:A.

If the column is not provided the query should return the whole row. Same as 1:1.

The external widget would need a worksheet reference: =Sheet1![[RowName][ColumnName]] to reference the cell of Handsontable instance. And since we're outside of any defined table this should look for a global column and row names.

If the name contains special characters it can be enclosed by additional square brackets: [[Column # Name]]. Probably we should keep this syntax for compatibility with other apps.

A row can also be one of those helpful keywords:

Reference operators can be applied (syntax depends on configuration #58). Use additional brackets around expressions with operators [ ].

Whitespace should be preserved:

The problem: where is Handsontable in all of that

When we're not in any named table, or Handsontable ID is used as a table name then the RowName, ColumnName, [#Headers] should refer to Handsontable column headers and row headers. For this to work, we have to provide the mapping between name and index.

How do we feed headers to HyperFormula from an external source?

  1. We can set them as arrays: rowHeaders, columnHeaders and mapping is done by HF internally both ways.
  2. We have to define callbacks to get rowHeader name by index but also to get index by name:
    • getRowHeaderByIndex(sheet, index) { return rowHeaders[sheet][index]; }
    • getRowHeaderByName(sheet, name) { return rowHeaders[sheet].indexOf(name); }

Naming rules

Other

Links

[1] https://support.airtable.com/hc/en-us/articles/203255215-Formula-Field-Reference [2] https://docs.microsoft.com/en-us/dax/dax-syntax-reference [3] https://docs.oasis-open.org/office/OpenDocument/v1.3/csprd02/part4-formula/OpenDocument-v1.3-csprd02-part4-formula.html#__RefHeading__1017964_715980110 [4] https://support.office.com/en-us/article/names-in-formulas-fc2935f9-115d-4bef-a370-3aa8bb4c91f1 [5] https://support.office.com/en-us/article/using-structured-references-with-excel-tables-f5ed2452-2337-4f71-bed3-c8ae6d2b276e

marek-baranowski commented 4 years ago

+100 🔥🔥🔥

Great analysis! And happy to see this feature added to the roadmap, for me it's essential ✌️ I was actually planning to give this task a try soon. Will be happy to help, I was dealing with a similar case at Reactivepad, with tables and structured references:

struct-refs

BTW. do you maybe have some data on how your users use Handontable in most cases, 1. as a spreadsheet; 2. as a table (like column names instead of A, B, C etc.)?

That's one of the toughest design problems in Reactivepad - to give people a familiar spreadsheet which doesn't fit very well within document (among many other issues) or to stay with table & calculated columns, something that many big players (Notion, Airtable, Coda) already successfully deployed.

wojciechczerniak commented 4 years ago

Thanks! And thank you for the real use case example.

We don't have such data. Handsontable can be used in both scenarios and it's up to the developer which one she will choose for her end users. Notion and Airtable have a database like tables where each row is a record, they have row relative notation. But there is also Quip which took another way and has A1 notation within the document and it doesn't feel like it does not fit well within the document. I think that both are valid approaches.

marek-baranowski commented 4 years ago

@wojciechczerniak just wanted to ask about the realistic launch date of this feature :) Of course no rush from my side, I know you have your schedules. Just wondering if I should start working on this, either as an alternative implementation or maybe as the contribution to the official code.

wojciechczerniak commented 4 years ago

@marek-baranowski We have a call scheduled next Monday where we will discuss tasks for February/roadmap. Until then I don't have any ETA, sorry.

marek-baranowski commented 4 years ago

sure, no problem.

marek-baranowski commented 4 years ago

@marek-baranowski We have a call scheduled next Monday where we will discuss tasks for February/roadmap. Until then I don't have any ETA, sorry.

@wojciechczerniak let me just remind myself with the question about ETA :) were you able to estimate this task maybe?

krzysztofspilka commented 4 years ago

@marek-baranowski we're still determining dates and scope :(

swistak35 commented 4 years ago

PRs: #167 #192

In these PRs part of this was done, meaning: formula expressions on workbook level, without ability to call them from the other formulas. But they can be registered and changes to them are in ContentChanges sets.

From the naming rules, these were applied:

Up to 255 characters length.

:heavy_check_mark:

Names can contain large and small letters but matching is case insensitive. The evaluator engine should normalize the name. The registered name should be used when stringifying AST.

:heavy_check_mark: but as I said above, there's no "AST part" yet

Names that look like a cell reference are not allowed. A1: $A$100, or R1C1.

:heavy_check_mark: for the first part, we don't have the R1C1 syntax so I didn't do that part

The first character should be a letter or an underscore. The rest can contain letters, numbers, periods and underscores.

:heavy_check_mark: and also unicode letters are accepted

Whitespace is not allowed in named expression. It is allowed inside structured references enclosed in square brackets [ ] and should be preserved to improve readability.

:heavy_check_mark: whitespace not allowed, but the rest doesn't apply for regular named expressions

Some characters have special meaning for syntax: [, ], # and '. They should be escaped with single quotation mark '.

Not done, because it doesn't matter in these case? These characters are forbidden in named expression name, so I assume this is only for structured references syntax.


From methods, for now there's option to add named formula expression on workbook level, change it's formula, list available ones and get the value of specific named expression.


Also, the -1 sheet workaround is hidden, but still "leaks" when someone would try to use non absolute references :) I don't know what we want to do about it because I see that different tools works in a little bit different way

wojciechczerniak commented 4 years ago

@swistak35

Also, the -1 sheet workaround is hidden, but still "leaks" when someone would try to use non absolute references :) I don't know what we want to do about it because I see that different tools works in a little bit different way

  • some of them treat A1 expression as self-references and work in whatever sheet you use it
  • some of them don't allow you to enter the formula without being absolute about the specific cell I didn't check all of the tools, someone else may do that and clarify what are the options 👀 But it's decision still to be done and the usage of relative references should be treated as undefined for now

I would go with the first one. If the reference is not defined as absolute it's relative to the sheet (context) it's called in. We even have this context provided in calculateFormula method so this should be a safe approach.


What about finishing the API? I've seen that we can get the expression value, which is the result of the calculation? But can we get the expression itself? IMO we're missing one method: