Open wojciechczerniak opened 4 years ago
This would be an extremely useful feature, especially for custom functions that need to load data from the backend like our FINANCIAL
API.
Currently we preload data on page load but later on will need async functions I think as we can't preload all financial data for everything.
Is this on the roadmap soon and is their an idea of how this will need to be done? Because I could try and do it in a few months when we will need it if there's no plans to implement it yet
@wojciechczerniak I'm going to try and implement this feature because we need it for our site on custom functions.
Could you give some rough guidance on the steps that need to be taken to do this please?
Unless this feature is a massive re-write or something.
Key Concepts will be a good start to understand how HyperFormula works with AST. I would start with Evaluator or Interpreter, who calls function and when. Then go from there through each caller to make it async as well. This would stop at public methods, I think. Maybe @bardek8 or @izulin have more detailed ideas. Sorry, I couldn't be more helpful here, not my expertise.
Some of my thoughts and design considerations:
Both function types should work. Preferably without changing everything to automatically resolved promises. We need to discover when function is async or not.
The naive implementation may traverse AST as synchronous code does. One function at a time. But while for synchronous code we cannot do anything about this, the asynchronous code is a different story. If asynchronous functions are used to make HTTP requests, we may parallelize them with Promise.all and continue resolving formulas after all input is ready.
As mentioned in Key Concepts guide, same functions and ranges are grouped to avoid calculating the same value twice. But does it apply to asynchronous functions? I bet they will not always return same value. ie =REMOTE_GUID()
to generate some unique ID, or =RANDOM_PRODUCT()
that may return random output for each use case.
What will happen if one of the promises is rejected? We need a new ErrorType, and ErrorCode. But most importantly, we need to make sure that we handle Promise.reject
well and gracefully.
Should we retry if Promise is rejected? Or what if a function author loop promises inside his code? Or there are more than one promise and HTTP request within a single function.
If we have retries, how many is too many? We should break the loop at some point and display an error (different one, than before?). Even if retries are not our code, we should have a timeout for each asynchronous function in case they take to long to finish
I totally have no idea how to handle those. An async function may want to make request based on results of some other functions that are also a result of the formula. Edge case, but may bite us 😵
They're more or less cells with different addressing. But async functions should be tested with scopes.
Does asynchronous functions are always volatile and should be re-requested each time a change to workbook is made? Probably yes if want to keep most of HF logic. Maybe responses should be read from cache when possible then? An external cache should be sufficient, ServiceWorker can be used for example. So maybe we don't have to worry about it within HF code.
What will happen on undo
? Should we restore previous value or make a new async request?
Each method has to be considered separately. But most of them return changes
array. Therefore they need to all calculations to end before code can be evaluated further. That means most of API methods has to return Promise<changes>
instead just changes
.
Before:
const changes = hfInstance.setCellContents({ col: 3, row: 0, sheet: 0 }, [['=B1']]);
After:
const changes = await hfInstance.setCellContents({ col: 3, row: 0, sheet: 0 }, [['=B1']]);
ie. calculateFormula
when called will always make a new request when async function was used or referenced ranges contain one. It also should become asynchronous: async calculateFormula(formula: string): Promise<result>
This combination of features blows my mind 🤯 Returned arraySize
should be async as well then. But can we wait until function is fully evaluated and all promises resolved? or should we predict a value first?
I don't see why it wouldn't work if all the above is done correctly. Just make sure we don't make any requests or resolve promises while in suspendEvaluation
state.
If you're going to make a PR, please consult library for creating network mocks. I believe we don't have one yet in this project as it wasn't needed. Any new tool should be convenient for @handsontable/devs, as they will maintain the functionality later on.
Thanks for the information. That definitely helps. Will start on it tomorrow and see how it goes.
@wojciechczerniak I've got initial async functions working in custom-functions.spec.ts. Ignore the test file changes. Evaluator.ts & Interpreter.ts are the main changes.
https://github.com/TrackTak/hyperformula/tree/async-functions
Async and non async Both function types should work. Preferably without changing everything to automatically resolved promises. We need to discover when function is async or not.
I'm having some trouble with understanding how this would be possible & specifically what you mean by this?
Currently in the branch I've changed any public function that internally calls recomputeIfDependencyGraphNeedsIt
to be async and return a promise.
If we have async functions such as ASYNC_SUM
which affect other cells then even functions like addRows()
will need to return promises I think. Because it will need to recalculate these cells in an async way.
If the async functions are just functions that hit backend API's and don't depend on other cells then I can see how addRows
could stay synchronous because it would never affect another cell and I could add seperate intenral methods that are sync & async versions.
Is this correct or am I wrong here?
I just wanted clarification before I end up changing all the test files if I am wrong.
Async and non async Both function types should work. Preferably without changing everything to automatically resolved promises. We need to discover when function is async or not.
I'm having some trouble with understanding how this would be possible & specifically what you mean by this?
I see two options, runFunctionSync
+ runFunctionAsync
or recognize functions that are returning Promise
with instanceof Promise
.
Currently in the branch I've changed any public function that internally calls
recomputeIfDependencyGraphNeedsIt
to be async and return a promise.If we have async functions such as
ASYNC_SUM
which affect other cells then even functions likeaddRows()
will need to return promises I think. Because it will need to recalculate these cells in an async way.
We're not mixing functions with api methods? All API methods will be async always.
If the async functions are just functions that hit backend API's and don't depend on other cells then I can see how
addRows
could stay synchronous because it would never affect another cell and I could add seperate intenral methods that are sync & async versions.
Unless we decide to drop requirement to return changes
. But IMO it's a very useful functionality.
runFunction
should always be async.volatile: true
should handle this automatically. Not async specific.volatile: true
Error handling What will happen if one of the promises is rejected? We need a new ErrorType, and ErrorCode. But most importantly, we need to make sure that we handle Promise.reject well and gracefully.
I've added a timeoutTime
property on the config & a new #TIMEOUT!
error. https://github.com/TrackTak/hyperformula/commit/af124a0a42a1d507c389dea40d5d62a6a01a6707
If this is fine I can add the translations.
Parallel evaluation The naive implementation may traverse AST as synchronous code does. One function at a time. But while for synchronous code we cannot do anything about this, the asynchronous code is a different story. If asynchronous functions are used to make HTTP requests, we may parallelize them with Promise.all and continue resolving formulas after all input is ready.
Not too sure how to do this. I think it needs to a change to the algorithm for the sorting of formulaVertexes, because what happens if the async formulas have cell references in it?
For example, the following code:
const engine = await HyperFormula.buildFromArray([
[1, '=ASYNC_FOO(A1)', '=ASYNC_FOO(A1+A4)', 6],
[2, '=ASYNC_FOO(B1)']
])
I think requires algorithm change to group the separate tree's so this will be possible:
Promise.all([A2,A3])
& Promise.all([B2])
Currently this.dependencyGraph.topSortWithScc()
returns a sorted flat array.
I see two options, runFunctionSync + runFunctionAsync or recognize functions that are returning Promise with instanceof Promise.
~This is not possible.~
~Cannot split out runFunction
& runFunctionAsync
. All of them must be runFunctionAsync
because non-async functions such as SUM()
could contain async AST's like =SUM(=PRICE("AMZN"), 2)
.~
~So the evaluateAst
in runFunction
must be done in an async manner always like so:~
Volatile Does asynchronous functions are always volatile and should be re-requested each time a change to workbook is made? Probably yes if want to keep most of HF logic. Maybe responses should be read from cache when possible then? An external cache should be sufficient, ServiceWorker can be used for example. So maybe we don't have to worry about it within HF code.
I do not beleive that all async functions should be automatically marked as volatile.
Consider this function: =PRICE("AMZN", "price", DATE(2020,12,31), 1)
. It needs to be async but it will always return the same value.
Users of the FunctonPlugin
should just mark which functions are volatile with the volatile: true
flag.
Also for the ServiceWorker cache, I read the docs and I'm not sure how this can work. Different function params could require different cache update times: i.e
=PRICE("AMZN", "price")
.
=PRICE("AMZN", "revenue")
The second functions network request should be cached for a long time whereas the first should probably be cached for a short period (perhaps every 15 minutes for example) because price updates a lot faster than revenue.
The ServiceWorker cannot know whether to refetch the data or not.
I took a look at the caching strategies here: https://developers.google.com/web/tools/workbox/modules/workbox-strategies
None of these seem to be what I think is needed. These seem to be suited more towards Hyperformula working in offline mode (which seems like another ticket).
Perhaps we should not implement a cache and instead just let users of Hyperformula implement their own caching for these functions on their backend?
This way the network request always fires but it returns a cached value or the updated one always so it's always up to date.
What will happen on undo? Should we restore previous value or make a new async request?
It should make a new network request as mentioned above with the backend cache this should be fine anyway. This means there's no stale data either.
Error handling What will happen if one of the promises is rejected? We need a new ErrorType, and ErrorCode. But most importantly, we need to make sure that we handle Promise.reject well and gracefully.
I've added a
timeoutTime
property on the config & a new#TIMEOUT!
error. TrackTak@af124a0If this is fine I can add the translations.
LGTM 👍 I wouldn't bother with translation until PR is accepted
Parallel evaluation The naive implementation may traverse AST as synchronous code does. One function at a time. But while for synchronous code we cannot do anything about this, the asynchronous code is a different story. If asynchronous functions are used to make HTTP requests, we may parallelize them with Promise.all and continue resolving formulas after all input is ready.
Not too sure how to do this. I think it needs to a change to the algorithm for the sorting of formulaVertexes, because what happens if the async formulas have cell references in it?
For example, the following code:
const engine = await HyperFormula.buildFromArray([ [1, '=ASYNC_FOO(A1)', '=ASYNC_FOO(A1+A4)', 6], [2, '=ASYNC_FOO(B1)'] ])
I think requires algorithm change to group the separate tree's so this will be possible:
Promise.all([A2,A3])
&Promise.all([B2])
Currently
this.dependencyGraph.topSortWithScc()
returns a sorted flat array.
I'm not sure either. Not my expertise, sorry. CC @bardek8 or @izulin
I see two options, runFunctionSync + runFunctionAsync or recognize functions that are returning Promise with instanceof Promise.
This is not possible.
Cannot split out
runFunction
&runFunctionAsync
. All of them must berunFunctionAsync
because non-async functions such asSUM()
could contain async AST's like=SUM(=PRICE("AMZN"), 2)
.So the
evaluateAst
inrunFunction
must be done in an async manner always like so:const promises = args.map((ast) => this.evaluateAst(ast, state)) const values = await Promise.all(promises) argValues = values.map((value) => [value, false])
I see. Well, that's unfortunate and another breaking change. Just async public methods are enough to make it a feature for HF 2.0, but still another pain to migrate. Unless @bardek8 @izulin @voodoo11 see some other way to do it.
Volatile Does asynchronous functions are always volatile and should be re-requested each time a change to workbook is made? Probably yes if want to keep most of HF logic. Maybe responses should be read from cache when possible then? An external cache should be sufficient, ServiceWorker can be used for example. So maybe we don't have to worry about it within HF code.
I do not beleive that all async functions should be automatically marked as volatile.
Consider this function:
=PRICE("AMZN", "price", DATE(2020,12,31), 1)
. It needs to be async but it will always return the same value.Users of the
FunctonPlugin
should just mark which functions are volatile with thevolatile: true
flag.
I agree. That's a good design decision 👍 Thank you.
Also for the ServiceWorker cache, (...)
Perhaps we should not implement a cache and instead just let users of Hyperformula implement their own caching for these functions on their backend?
Yeah, that's what I had in mind. ServiceWorker is application developer responsibility. If we have volatile and undo-redo sorted out, caching wont be necessary internally.
What will happen on undo? Should we restore previous value or make a new async request?
It should make a new network request as mentioned above with the backend cache this should be fine anyway. This means there's no stale data either.
Agree 👍
Retries - Need clarification if @handsontable/devs actually want this or not Circuit breaker - Need clarification if @handsontable/devs actually want this or not
IMO timeout
is enough of a circuit breaker. Function author may loop and retry inside function as long as it's within the time limit.
New draft PR: https://github.com/handsontable/hyperformula/pull/863
The PR #863 was abandoned after a sincere attempt by @MartinDawson. Maybe Martin would like to explain here what's his current approach to async functions, if there is any?
FWIW, right now, the easiest way at achieve the asynchronicity is in the user code outside of HyperFormula. I mean a two-step strategy, in which your custom function e.g. SHARE_PRICE("AMZN")
synchronously resolves to a value from cache (or gives an error if the cache is empty) while asynchronously fetching the current value. When fetching is done, update the cache and reevaluate getCellValue
/getSheetValues
. We will soon add such example to the docs #892.
Or, if you need it to run in one pass, simply resolve your async code before feeding the data to Hyperformula.
Implementing async functions as discussed here is not something that we are working on right now. The focus for this quarter is to improve what's available under the existing API (bug fixes, docs).
We might put async functions on the roadmap in the upcoming quarter, because it is obviously useful for some scenarios.
@warpech I'm using this (buggy) PR still: https://github.com/handsontable/hyperformula/pull/879
I couldn't really understand how to fully get your suggestion to work from trying it so I am not doing it that way.
Issues with my draft PR is that I should not have changed evaluator.ts to be async and instead calculated async values in a separate file like the ArraySize is done basically.
There's also issues with async dependencies, array formulas & parallelisation that don't properly work in it in more complex scenario's.
I'm working on it again though bit by bit (as I need it) as I know how to do it much better now I understand HF more so when async functions is picked up in this or a future quarter I can make another PR and should have majority of it done by then.
FWIW, right now, the easiest way at achieve the asynchronicity is in the user code outside of HyperFormula. I mean a two-step strategy, in which your custom function e.g. SHARE_PRICE("AMZN") synchronously resolves to a value from cache (or gives an error if the cache is empty) while asynchronously fetching the current value. When fetching is done, update the cache and reevaluate getCellValue/getSheetValues.
@warpech, I know this thread hasn't been active in a while but is there any update? Is the strategy quoted above still the recommended approach? Thanks!
@JohannCooper Sorry for (very late) reply, I got it working a while back with async cells at each level (i.e promise.all() at each level down the ayclic directed graph tree) but moved onto something else so never got back to this PR
You can see the repo here: https://github.com/TrackTak/hyperformula
Specifically: https://github.com/TrackTak/hyperformula/commit/ca9bbad10e34bff2d5ba50010673875512c80d6d https://github.com/TrackTak/hyperformula/commit/d8739857dc6853ddeb0cd717d24381bac1c8dadb https://github.com/TrackTak/hyperformula/commit/ee54baba545626a9c7a0d0a60c1553a2cdc2e60a
And (unfinished) chunking of async values (which is more performant): https://github.com/TrackTak/hyperformula/commits/chunk-async-formulas
IIRC @warpech suggestion didn't fully work for me for some reason or was not efficient enough, I can't 100% remember
Description
Common request for our previous formula engine. Let's keep it as a feature request.
Sample