vaadin / spreadsheet

Spreadsheet for Vaadin Framework
Other
46 stars 55 forks source link

ConditionalFormatter hack breaks large workbooks and doesn't scale #358

Closed vaadin-bot closed 7 years ago

vaadin-bot commented 8 years ago

Originally by woozy


Issue #18705 points out the "create a new row way down the sheet for formula evaluation" hack.

That issue dealt only with the wire expense of the extra cells between the server and client.

However, that hack is awful for many reasons, and completely breaks many real-world spreadsheets I was testing, to see if Vaadin Spreadsheet could be a solution for us.

  1. We often see sheets with > 32,767 rows of valid data. Sticking a new row at that position just for calculating a transient formula wipes out a row of real data.
  2. This implementation does not take advantage of the caching in the POI XSSFEvaluation* classes.

1 is a showstopper, as it wipes out existing data for no purpose other than evaluating conditional formulas.

2 makes even a small (350K) Excel file with a few thousand formulas referencing one another take HOURS to display in the Vaadin Spreadsheet Demo application. I've attached a sample file, the same one I shared with the POI team.

That same sample file, after the POI fixes in POI bug 57840 (by me), and the following changes to Vaadin Spreadsheet, displays in about 90 seconds. Still poor, but close enough that I think I can make further improvements to get it down where I need it.

To that end, I offer the following changes to evaluate conditional format formulas without creating a temporary Cell. These are against POI 3.15 trunk, as my sample needed the performance improvements and formula syntax parsing enhancements in the aforementioned POI bug fix. They may work without modification against 3.12, but I haven't checked, as I can't use 3.12 for my project anyway.

Without these Vaadin Spreadsheet changes, upgrading to POI 3.15 with the above patch won't offer any performance improvement, as the caches it adds are thrown out by Vaadin with every cell evaluation.

First, we need a utility class in the org.apache.poi.ss.formula package, to make use of some package scoped methods:

/**
 - Helper to evaluate POI values that are package scoped, like formula Ptg[] arrays.
 -/
public class WorkbookEvaluatorUtil {

    /**
     - Evaluate formula Ptg[] tokens
     -/
    public static ValueEval evaluate(Spreadsheet spreadsheet, Ptg[] ptgs) {
        // allow for reuse of evaluation caches for performance - see POI #57840
        // for an example
        final WorkbookEvaluator workbookEvaluator = ((BaseXSSFFormulaEvaluator) spreadsheet.getFormulaEvaluator())._getWorkbookEvaluator();
        final OperationEvaluationContext ec = new OperationEvaluationContext(workbookEvaluator, workbookEvaluator.getWorkbook(), 0, 0, 0, new EvaluationTracker(new EvaluationCache(null)));
        return workbookEvaluator.evaluateFormula(ec, ptgs);
    }

    public static BaseXSSFEvaluationWorkbook getEvaluationWorkbook(Spreadsheet spreadsheet) {
        return (BaseXSSFEvaluationWorkbook) ((BaseXSSFFormulaEvaluator) spreadsheet.getFormulaEvaluator())._getWorkbookEvaluator().getWorkbook();
    }
}

Then, in ConditionalFormatter, change ,matchesFormula() to this:

    protected boolean matchesFormula(ConditionalFormattingRule rule, int deltaColumn, int deltaRow) {
        if ( ! (rule instanceof XSSFConditionalFormattingRule)) {
            // TODO Does not support HSSF files for now, since HSSF does not
            // read cell references in the file correctly.Since HSSF formulas
            // are read completely wrong, that boolean formula above is useless.
            return false;
        }
        String booleanFormula = rule.getFormula1();
        if (booleanFormula == null || booleanFormula.isEmpty()) {
            return false;
        }

        // Parse formula and use deltas to get relative cell references to work
        // (#18702)
        Ptg[] ptgs = FormulaParser.parse(booleanFormula, HpdWorkbookEvaluatorUtil.getEvaluationWorkbook(spreadsheet),
                FormulaType.CELL, spreadsheet.getActiveSheetIndex());

        for (Ptg ptg : ptgs) {
            // base class for cell reference "things"
            if (ptg instanceof RefPtgBase) {
                RefPtgBase ref = (RefPtgBase) ptg;
                // re-calculate cell references
                if (ref.isColRelative()) {
                    ref.setColumn(ref.getColumn() + deltaColumn);
                }
                if (ref.isRowRelative()) {
                    ref.setRow(ref.getRow() + deltaRow);
                }
            }
        }

        // booleanFormula has now its relative rows and cols updated
        booleanFormula = FormulaRenderer.toFormulaString(WorkbookEvaluatorUtil.getEvaluationWorkbook(spreadsheet), ptgs);

        final ValueEval eval = WorkbookEvaluatorUtil.evaluate(spreadsheet, ptgs);

        return eval == null ? false : ((BoolEval) eval).getBooleanValue();
    }

Imported from https://dev.vaadin.com/ issue #19952

vaadin-bot commented 8 years ago

Originally by woozy


Attachment added: StructuredRefs-lots-with-lookups.xlsx (317.8 KiB) Sample of lots of formulas using "Structured Reference" syntax (Excel Table lookups)

vaadin-bot commented 8 years ago

Originally by @alvarezguille


Hi woozy,

Thanks for the interest in Vaadin Spreadsheet and the patch, we will take a look at it as soon as possible.

vaadin-bot commented 8 years ago

Originally by Dmitrii Rogozin


Hello, woozy. Thank you for your contribution. We have to change few other things, related to POI update. You can take a look here https://dev.vaadin.com/review/#/c/13729/.

Unfortunately, we can not depend on beta version of POI. As soon as, there will be POI 3.15 final release, we will submit your patch and release it with new version of Vaadin Spreadsheet.

vaadin-bot commented 8 years ago

Originally by woozy


Thanks. I'm one of the contributors to 3.15, including support for "structured reference" formulas (Excel Table syntax), and formula evaluation performance improvements.

However, I just filed https://bz.apache.org/bugzilla/show_bug.cgi?id=59814, which I hope to get into 3.15 also. This is needed by Vaadin Spreadsheet in conjunction with the 3.15 performance improvements, or cell changes don't get picked up by formula evaluation properly when referencing tables.

My local code, which hacks things to work with 3.15 alpha trunk, automatically finds and adds tables to spreadsheets from the POI source, and implements support for query tables by translating ODBC to JDBC (with some restrictions and caveats), including cell reference parameters. That updates table cell contents and table ranges, so the formula evaluation reset is needed to pick up and display updated formula results.

vaadin-bot commented 8 years ago

Originally by woozy


POI 3.15 is out. Are there any dev builds using it I can try out yet? :)

I need 3.15 for performance and feature support in the Excel files I need to handle, but Vaadin Spreadsheet needs small updates to handle API changes.

On a larger note, I'm not happy with the static functions in SpreadsheetFactory - being static, I can't subclass and override when needed to fix compatibility issues on my end. It should be instance methods, with a factory creator class registered and able to be overridden.

vaadin-bot commented 8 years ago

Originally by @Peppe


Hi woozy. We have a patch in review for 3.15, https://dev.vaadin.com/review/#/c/14501/. It hasn't been merged yet so no official release yet.

vaadin-bot commented 8 years ago

Originally by woozy


Anything I can do to help? :) My project can't run due to a binary signature incompatibility in a SpreadsheetFactory static method, so I'm stuck and can't move forward at the moment.

Replying to Jansson:

Hi woozy. We have a patch in review for 3.15, https://dev.vaadin.com/review/#/c/14501/. It hasn't been merged yet so no official release yet.