vallettea / koala

Transpose your Excel calculations into python for better performances and scaling.
GNU General Public License v3.0
144 stars 59 forks source link

How to receive only value changed cells #227

Open eiso opened 5 years ago

eiso commented 5 years ago

First of all, thank you for such a great open-source project.

I am looking at using Koala to be the backend for a web based spreadsheet.

Currently it is clear to me how I can use the graph to set values, formulas & to evaluate specific cells with formulas. However when I add a new formula, or change the value in a cell, I'd like to run Koala and know which cells output values have changed (either because they are references to another cell, or because they are formulas) so I can update the web spreadsheet.

This is related to my second question, is there an easy way get the entire graph flattened as an object?

Any advice is much appreciated.

eiso commented 5 years ago

Regarding my second question, I found the answer in the code:

sp_scratch = Spreadsheet()
sp_scratch.asdict()
bradbase commented 5 years ago

Hi @eiso,

It's great to hear about the uses people are finding for Koala.

From what I've seen while working with Koala, it isn't built for an event driven use case. That may not such a problem, even Excel needs a re-calc trigger. But I'm yet to see a useful way to trigger an event based on a change in a cell/node after an evaluate ("re-calc").

Does the detect_alive method help you at all?

Up until koala version 0.0.31 it was possible to extract sub parts of the "spreadsheet"/graph and manage lists of input and output cells. For example;

excel_compiler.gen_graph(inputs= [], outputs= [])

That concept, used in conjunction with a method I've put into Koala -- ExcelParser.getOperandRanges() -- allows you the freedom to re-assess which input cells are associated with a particular output cell.

eg;

    inputs = parse_model(model)
    parser = ExcelParser()
    inputs = parser.getOperandRanges()
    agraph = excel_compiler.gen_graph(inputs= inputs, outputs= [NAMED_RANGE_OR_CELL_ADDRESS])
    # ...
    # Then you can evaluate the generated agraph

Might be fun to take a look at my project FlyingKoala for a contextual use of the above. It is very useful when used in conjunction with User Defined Functions. https://github.com/bradbase/flyingkoala

bradbase commented 5 years ago

I will draw your attention:

The use case of gen_graph with input and output arguments stopped working after version 0.0.31 due to a refactor which moved the functionality of ExcelCompiler into the Spreadsheet object.

There is a condition in Koala where an ExcelCompiler object will cause a "creation" pathway and a Spreadsheet object would cause an "update" pathway.

Having only the "update" pathway it's not possible to extract parts of the Excel Workbook as a graph unto itself, the Excel Workbook is now loaded in its entirety into a graph any time you generate a graph. You can still do all the same operations, it's just that the entire workbook is to be loaded anytime you generate a graph. This is less than ideal for my use case, but might be fine for yours.

I have been trying to fix the use case of using gen_graph with arguments but have discovered I can't find a way to both keep the use case and the refactor without re-architecting Koala or re-engineering it extensively. I'm happy to take advice if you've got any WRT once again supporting gen_graph with arguments.

bradbase commented 5 years ago

Actually... scratch that.

I might be able to use an isinstance in the right spot....

Watch for a PR

bradbase commented 5 years ago

From memory, I have also been able to print out the count of nodes in various circumstances and so can watch when things change in the graph.

eg; Graph loading done, 89 nodes, 173 edges, 89 cellmap entries

eiso commented 5 years ago

Thank you so much @bradbase for a very extensive answer. I am going to familiarize myself with the library this weekend, and will also check out FlyingKoala (looks really cool)

Does the detect_alive method help you at all?

Going to check this out and let you know

That concept, used in conjunction with a method I've put into Koala -- ExcelParser.getOperandRanges() -- allows you the freedom to re-assess which input cells are associated with a particular output cell.

This is definitely relevant. For being an interactive spreadsheet, I might be able to get away with only updating visible cells at any moment. However the reverse is what I really need, from a change in a cell, which are the cells affected. Since this is stored as a graph, I'll have to do some data viz. but I can imagine it shouldn't be too hard to code up the graph path from a one-to-many relationship.

You can still do all the same operations, it's just that the entire workbook is to be loaded anytime you generate a graph. This is less than ideal for my use case, but might be fine for yours.

My graph is likely to be much larger then what a sheet allows to show (100k-2mm rows in a sheet) so I can imagine this might take a lot of computation. So far on small graphs I have been getting very fast results, so will try to stress test it. If the computation is fast I don't mind generating the entire graph, but as you already mentioned, it is preferred to only update relevant cells.

bradbase commented 5 years ago

@eiso

Thank you so much @bradbase for a very extensive answer. I am going to familiarize myself with the library this weekend, and will also check out FlyingKoala (looks really cool)

No worries :D

This is definitely relevant. For being an interactive spreadsheet, I might be able to get away with only updating visible cells at any moment. However the reverse is what I really need, from a change in a cell, which are the cells affected. Since this is stored as a graph, I'll have to do some data viz. but I can imagine it shouldn't be too hard to code up the graph path from a one-to-many relationship.

I look forward to the infrastructure to do that. At the end of the day the dependencies are in a network graph so there is likely some existing methods in the network library which help traverse the DiGraph.

At one stage I was able to get something together which graphically depicted the DiGraph as generated by Koala. It was most interesting, but demonstrated the network of relationships.

My graph is likely to be much larger then what a sheet allows to show (100k-2mm rows in a sheet) so I can imagine this might take a lot of computation. So far on small graphs I have been getting very fast results, so will try to stress test it. If the computation is fast I don't mind generating the entire graph, but as you already mentioned, it is preferred to only update relevant cells.

Your megasheet application will be awesome!

Already -- I want to use it.

I am almost certain we will need to re-factor Koala, and potentially re-write parts, to be able to handle the demands of such an application. Koala is in dire need of a significant clean-up. And I have been torn between my project which has want for more development and Koala which needs attention.

eiso commented 5 years ago

@bradbase anywhere I can write you on chat. Would love to share some thoughts live instead of async? My email is contact [at] eiso - kant [dot] com