vaadin / spreadsheet

Spreadsheet for Vaadin Framework
Other
46 stars 55 forks source link

Conditional Formatting flaws, proposed fix in conjunction with POI changes #461

Open WoozyG opened 7 years ago

WoozyG commented 7 years ago

I'm both a POI project committer/PMC member, and Vaadin ProTools user working on a commercial offering that uses Vaadin Spreadsheet.

I've found the following flaws with Vaadin Spreadsheet's ConditionalFormatter class:

  1. The tracking for top/left border styles is done by ConditionalFormatting instance, but the styling can differ between ConditionalFormattingRule instances within those, so that should be the key. If two rules in the same ConditionalFormatting define different border rules, only one is displayed, and used for cells matching either rule.
  2. The "Stop if True" flag field is checked only when building the rule CSS, not when evaluating rules per cell, meaning rules defined after the first rule with the flag set never even have their CSS generated, let alone applied to matching cells.
  3. If a rule containing a top or left border format applies to a cell in row 1 or column A respectively, it can trigger ArrayIndexOutOfBounds exceptions.
  4. BORDER_STYLE_DEFAULT is a private field - there should be a way to customize it
  5. Rules based on dynamic data, that is, cells that can be edited, are not re-evaluated when the data changes, or, if they are, the rules are completely rebuilt. One is wrong, the other inefficient.

I've got working code in a POI fork, that I'm planning to commit to POI, that solves the evaluation issues and improves performance by 50% or more in my tests. I have a working replacement for ConditionalFormatter that uses this code, which I can share if you want.

You can test much of these bugs with the sample workbook provided by Microsoft, which contains all the example formatting rules from their help documentation:

http://download.microsoft.com/download/1/6/F/16F701E9-63BA-48D3-8B48-096F9288F443/AF010235700_en-us_cfsamples_af010235700.xlsx

I'd love to work on adding better support to POI that can be used by this project and others, and maintained by the open-source community.

I'm the user who contributed the code for the current WorkbookEvaluatorUtil class, which is a hack, from before I joined the POI project. I'd like to enable the needed functionality at a higher level and allow that code to go away in favor of something safer and more efficient. Especially since that code has some bugs, it turns out, for things like shifting evaluations by offsets for cells in a range, formulas for sheets other than the active one, etc.

WoozyG commented 7 years ago

Further, my working POI code now supports proper conditional evaluation logic for all types of Excel 2007+ (XLSX) rule types. The only logic left is the display-side treatment of things like color bars and icon sets, where the rule itself defines discrete value range buckets for partitioning all values.

I can demonstrate a working Vaadin Spreadsheet app that supports "top-10", "average", "duplicate/unique" rules, as well as dynamic conditional rules that change in response to user input :)

Peppe commented 7 years ago

Hi Greg! Thank you! Yes we'd love to get your ConditionalFormatter. Could you share it as a Pull Request? We could hold it in a separate branch until we have a patched version of POI.

As a side note POI matter, as I now have a channel open with you: Do you know if or when POI will get support for SUBTOTAL "ignores hidden values" functions? Recently the "includes hidden values" was added to POI but it throws an exception on the "ignore" ones, and we have some users requesting it.

WoozyG commented 7 years ago

I'll ask the group about it. I'm fairly new as a committer, but I've been a power user for over a decade. I suspect the issue is figuring out which rows are filtered - that may not be in the file itself, or it likely would have been supported at the same time as the other ones. My fear is it may need a full implementation of filter evaluation logic.

However, since I've been deep into evaluation code, maybe it wouldn't be that hard to add. I think I could use it too, which means I may be able to spend day job time on it, like this stuff :)

I'll attach my ConditionalFormatter here after some clean-up and additional comments to explain the new POI APIs I'm working on.

WoozyG commented 7 years ago

Yep, I just dug into the source a bit. It will require similar work to evaluate filters dynamically. One big issue would be keeping the row "hidden" attributes in sync with filter evaluations as condition inputs change due to user actions. That's a big job, integrating with FormulaEvaluator to notice changes.

It will take some design discussion on the POI dev list, and will likely have quite a few impacts. Not sure I have the time for it, but I could help and offer coordination and code review if Vaadin wanted to donate some code.

One less efficient but simpler method would be to add another external evaluator class for AutoFilters like I've done for conditional formatting and data validation. It could then reevaluate all rows when affected cells are changed. Knowing what to pay attention to is easier than conditional formats, as it seems AutoFilter settings don't include any formulas, only references to values in the current table column, so only those values need monitoring.

Peppe commented 7 years ago

Thanks for looking into it. It was worth the shot.

WoozyG commented 7 years ago

Actually, it looks pretty straightforward to calculate if the function assumes everything is current and updated properly. This will be the case for a saved workbook. Anything modified after reading would need to manage filter evaluation/row hiding outside the function, and appropriately clear the FormulaEvaluator cache. Since that's the current expectation for everything else, I don't see a problem with it.

I think all that is needed is to skip rows with the hidden attribute = true when gathering the inputs to the inner function call.

WoozyG commented 7 years ago

I can't attach a *.java file directly in GitHub, and changing it to .txt would mean no pretty formatting colors, so here's the full listing of my new ConditionalFormatter using the POI changes I'm working on.

Note that I've elevated all the required properties to the appropriate interfaces, so you no longer have to use reflection to get at the protected CT* fields.

The primary additions to POI will be

This evaluation object is the key to evaluation and hiding the complexity of the whole system. It now supports all rule types. If Vaadin were so inclined, they could use this code eventually to support icon sets and other "partitioning" rule types, as they are returned now. I haven't spec'd out an API for reading those rule definition details yet, but I'm open to suggestions.

I'm still debugging and adding more unit tests to the POI side, but it is looking good so far.

package com.vaadin.addon.spreadsheet;

import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

import org.apache.poi.ss.formula.BaseFormulaEvaluator;
import org.apache.poi.ss.formula.ConditionalFormattingEvaluator;
import org.apache.poi.ss.formula.EvaluationConditionalFormatRule;
import org.apache.poi.ss.usermodel.BorderFormatting;
import org.apache.poi.ss.usermodel.BorderStyle;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.FontFormatting;
import org.apache.poi.ss.usermodel.FormulaEvaluator;
import org.apache.poi.ss.usermodel.PatternFormatting;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.usermodel.extensions.XSSFCellBorder.BorderSide;

/**
 * This class doesn't actually use any code from {@link ConditionalFormatter},
 * it just overrides it's public methods.  Could/should probably be an interface,
 * but with some changes, since there are undocumented side effects such as
 * adding style CSS to the state object.
 * 
 * @author GregWoolsey
 */
public class ConditionalFormatterPOIDelegate extends ConditionalFormatter {
    private static final long serialVersionUID = 1;

    /*
     * TODO: private in parent class!
     * There should be a way to access and modify this.  Had to copy it here for now.
     */
    protected static final String BORDER_STYLE_DEFAULT = "1pt solid #d6d6d6;";

    /**
     * parent instance is private without an accessor
     */
    private Spreadsheet spreadsheet;

    /**
     * Evaluator to cache formats and conditional evaluations
     */
    private ConditionalFormattingEvaluator cfEvaluator;

    /**
     * Stored here as a convenience, and because we need to notice changes to this
     * from reloading a spreadsheet or any other path that could change it out from
     * under the cfEvaluator.
     */
    private FormulaEvaluator formulaEvaluator;

    /**
     * cache cell CSS style ID lists
     */
    private Map<CellReference, Set<Integer>> cellToCssIndex = new HashMap<CellReference, Set<Integer>>();

    /**
     * @param spreadsheet
     */
    public ConditionalFormatterPOIDelegate(Spreadsheet spreadsheet) {
        super(spreadsheet);
        this.spreadsheet = spreadsheet;
        spreadsheet.addCellValueChangeListener(e -> { if (cfEvaluator != null) cfEvaluator.clearAllCachedValues(); });
    }

    /**
     * this may be called after the component workbook has been swapped out from under us.
     * When that happens, the formula evaluator is recreated, so we can check for object equivalence.
     * 
     * @return true if the formula evaluator changed, and we have to reinitialize everything
     */
    protected boolean init() {
        if (formulaEvaluator != spreadsheet.getFormulaEvaluator()) {
            formulaEvaluator = spreadsheet.getFormulaEvaluator();
            cfEvaluator = new ConditionalFormattingEvaluator(spreadsheet.getWorkbook(), (BaseFormulaEvaluator) formulaEvaluator);
            return true;
        }
        return false;
    }

    /**
     * this may be called after the component workbook has been swapped out from under us.
     * When that happens, the formula evaluator is recreated, so we can check for object equivalence.
     * <p/>
     * The original class has a bug where it confuses/overwrites border styles when a single conditional formatting
     * has multiple rules defining different border styles.  The internal maps for
     * topBorders and leftBorders are keyed by ConditionalFormatting, when it should be by ConditionalFormattingRule.
     * 
     * @see com.vaadin.addon.spreadsheet.ConditionalFormatter#createConditionalFormatterRules()
     */
    @Override
    public void createConditionalFormatterRules() {
        // make sure old styles are cleared
        if (cellToCssIndex != null) {
            for (CellReference key : cellToCssIndex.keySet()) {
                int col = key.getCol();
                int row = key.getRow();

                // note: this is for the active sheet!  If that has changed, it's the right coordinates
                // for the UI, but not the cell specified by the key!
                Cell cell = spreadsheet.getCell(row, col);
                if (cell != null) {
                    spreadsheet.markCellAsUpdated(cell, true);
                }
            }
        }

        cellToCssIndex.clear();

        if (! init()) {
            // rules already in place, but data may have changed, re-evaluate cells
            cfEvaluator.clearAllCachedValues();
            return;
        }

        // build rules properly for all sheets, but don't evaluate cells yet

        // remove previous styles, if any(should not be needed now that we do all sheets at once)
        spreadsheet.getState().conditionalFormattingStyles = new HashMap<Integer, String>();

        for (int s=0; s < spreadsheet.getWorkbook().getNumberOfSheets(); s++) {
            buildStylesForSheet(spreadsheet.getWorkbook().getSheetAt(s));
        }
    }

    /**
     * Why use Integer CSS IDs?  Looking at uses, there is no reason they can't be String instead.
     * Or even an array of ints, so we can use sheet/row/column/border index arrays and return Set<int[]>
     * @see com.vaadin.addon.spreadsheet.ConditionalFormatter#getCellFormattingIndex(org.apache.poi.ss.usermodel.Cell)
     */
    @Override
    public Set<Integer> getCellFormattingIndex(Cell cell) {
        if (cell == null) return Collections.emptySet();
        CellReference ref = new CellReference(cell.getSheet().getSheetName(), cell.getRowIndex(), cell.getColumnIndex(), false, false);
        Set<Integer> styles = cellToCssIndex.get(ref);

        if (styles == null) {
            List<EvaluationConditionalFormatRule> rules = cfEvaluator.getConditionalFormattingForCell(cell);
            if (rules == null) rules = Collections.emptyList();
            styles = new HashSet<Integer>(rules.size());
            cellToCssIndex.put(ref, styles);
            for (EvaluationConditionalFormatRule rule : rules) {
                styles.add(Integer.valueOf(getCssIndex(rule, StyleType.BASE)));

                // LEFT border CSS goes on the cell to the left!
                if (rule.getRule().getBorderFormatting().getBorderLeftEnum() != BorderStyle.NONE) {
                    addBorderStyle(rule, cell.getRowIndex(), cell.getColumnIndex() - 1, StyleType.LEFT);
                }
                // TOP border CSS goes on the cell above!
                if (rule.getRule().getBorderFormatting().getBorderTopEnum() != BorderStyle.NONE) {
                    addBorderStyle(rule, cell.getRowIndex() - 1, cell.getColumnIndex(), StyleType.TOP);
                }
            }
        }

        return styles;
    }

    /**
     * Adds the proper border style ID to the given cell, creating it if needed.
     * @param rule
     * @param row
     * @param col
     * @param type
     */
    protected void addBorderStyle(EvaluationConditionalFormatRule rule, int row, int col, StyleType type) {
        if (row < 0 || col < 0) return; // out of bounds
        Cell cell = spreadsheet.getCell(row, col);
        if (cell == null) {
            cell = spreadsheet.createCell(row, col, "");
        }
        Set<Integer> styles = getCellFormattingIndex(cell);
        styles.add(new Integer(getCssIndex(rule, type)));
    }

    /**
     * @param sheet
     */
    protected void buildStylesForSheet(Sheet sheet) {
        for (EvaluationConditionalFormatRule rule : cfEvaluator.getFormatRulesForSheet(sheet)) {
            buildStyleForRule(rule);
        }
    }

    /**
     * @param rule
     */
    protected void buildStyleForRule(EvaluationConditionalFormatRule rule) {
        StringBuilder css = new StringBuilder();

        FontFormatting fontFormatting = rule.getRule().getFontFormatting();

        if (fontFormatting != null) {
            String fontColorCSS = colorConverter.getFontColorCSS(rule.getRule());
            if (fontColorCSS != null) {
                css.append("color:" + fontColorCSS);
            }

            // we can't have both underline and line-through in the same
            // DIV element, so use the first one that matches.

            // HSSF might return 255 for 'none'...
            if (fontFormatting.getUnderlineType() != FontFormatting.U_NONE && fontFormatting.getUnderlineType() != 255) {
                css.append("text-decoration: underline;");
            }
            if (fontFormatting.isStruckout()) {
                css.append("text-decoration: line-through;");
            }

            if (fontFormatting.getFontHeight() != -1) {
                // Excel stores height in 1/20th points, convert
                int fontHeight = fontFormatting.getFontHeight() / 20;
                css.append("font-size:" + fontHeight + "pt;");
            }

            // excel has a setting for bold italic, otherwise bold
            // overrides
            // italic and vice versa
            if (fontFormatting.isItalic() && fontFormatting.isBold()) {
                css.append("font-style: italic;");
                css.append("font-weight: bold;");
            } else if (fontFormatting.isItalic()) {
                css.append("font-style: italic;");
                css.append("font-weight: initial;");
            } else if (fontFormatting.isBold()) {
                css.append("font-style: normal;");
                css.append("font-weight: bold;");
            }
        }

        PatternFormatting patternFormatting = rule.getRule().getPatternFormatting();
        if (patternFormatting != null) {
            String colorCSS = colorConverter.getBackgroundColorCSS(rule.getRule());

            if (colorCSS != null) {
                css.append("background-color:" + colorCSS);
            }
        }

        addBorderFormatting(rule, css);

        addCssToComponentState(rule, StyleType.BASE, css);

    }

    /**
     * @param rule
     * @param css
     */
    protected void addBorderFormatting(EvaluationConditionalFormatRule rule, StringBuilder css) {
        // TODO Auto-generated method stub
        BorderFormatting borderFormatting = rule.getRule().getBorderFormatting();

        if (borderFormatting == null) return;

        BorderStyle borderLeft = borderFormatting.getBorderLeftEnum();
        BorderStyle borderRight = borderFormatting.getBorderRightEnum();
        BorderStyle borderTop = borderFormatting.getBorderTopEnum();
        BorderStyle borderBottom = borderFormatting.getBorderBottomEnum();

        // In Excel, we can set a border to 'none', which overrides previous
        // rules. Default is 'not set', in which case we add no CSS.
        boolean isLeftSet = borderLeft != BorderStyle.NONE;
        boolean isTopSet = borderTop != BorderStyle.NONE;
        boolean isRightSet = borderRight != BorderStyle.NONE;
        boolean isBottomSet = borderBottom != BorderStyle.NONE;

        if (isRightSet) {
            css.append("border-right:");
            if (borderRight != BorderStyle.NONE) {
                css.append(getBorderStyleCss(borderRight));
                css.append(colorConverter.getBorderColorCSS(BorderSide.RIGHT, "border-right-color", borderFormatting));
            } else {
                css.append(BORDER_STYLE_DEFAULT);
            }
        }
        if (isBottomSet) {
            css.append("border-bottom:");
            if (borderBottom != BorderStyle.NONE) {
                css.append(getBorderStyleCss(borderBottom));
                css.append(colorConverter.getBorderColorCSS(BorderSide.BOTTOM, "border-bottom-color", borderFormatting));
            } else {
                css.append(BORDER_STYLE_DEFAULT);
            }
        }

        // top and left borders might be applied to another cell, so store
        // them with a different index
        if (isTopSet) {
            // bottom border for cell above
            final StringBuilder sb2 = new StringBuilder("border-bottom:");
            if (borderTop != BorderStyle.NONE) {
                sb2.append(getBorderStyleCss(borderTop));
                sb2.append(colorConverter.getBorderColorCSS(BorderSide.TOP, "border-bottom-color", borderFormatting));

                addCssToComponentState(rule, StyleType.TOP, sb2);
            } else {
                css.append(BORDER_STYLE_DEFAULT);
            }
        }

        if (isLeftSet) {
            // right border for cell to the left
            final StringBuilder sb2 = new StringBuilder("border-right:");
            if (borderLeft != BorderStyle.NONE) {
                sb2.append(getBorderStyleCss(borderLeft));
                sb2.append(colorConverter.getBorderColorCSS(BorderSide.LEFT, "border-right-color", borderFormatting));

                addCssToComponentState(rule, StyleType.LEFT, sb2);
            } else {
                css.append(BORDER_STYLE_DEFAULT);
            }
        }

    }

    /**
     * @param border
     * @return
     */
    protected String getBorderStyleCss(BorderStyle border) {
        return SpreadsheetStyleFactory.BORDER.get(new Short((short)border.ordinal())).getBorderAttributeValue();
    }

    protected void addCssToComponentState(EvaluationConditionalFormatRule rule, StyleType type, StringBuilder css) {
        spreadsheet.getState().conditionalFormattingStyles.put(new Integer(getCssIndex(rule, type)), css.toString());
    }

    /**
     * CSS index value is deterministic, so just calculate it as needed rather than keeping extra maps around
     * @param rule
     * @param type
     * @return index for the given formatting, rule, and type
     */
    protected int getCssIndex(EvaluationConditionalFormatRule rule, StyleType type) {
        // each rule has 3 possible styles (StyleType).  Start at 9M just in case.
        return 9000000
                + spreadsheet.getWorkbook().getSheetIndex(rule.getSheet().getSheetName()) * 100000 // 899 sheets
                + rule.getFormattingIndex() * 10000 // room for 99 formatting indexes per sheet (mostly 1:1 with regions)
                + rule.getRuleIndex() * 10 // room for 999 rules per formatting (HSSF max 3, XSSF unlimited)
                + type.ordinal(); // 0-2
    }

    /**
     * used to generate unique CSS style IDs for the different types of styles needed
     */
    static enum StyleType {
        TOP,
        LEFT,
        BASE,
        ;
    }
}
WoozyG commented 7 years ago

I also found a bug in the current SUBTOTAL implementations. It looks to me like if that gets fixed, the functionality you want would come along with it.

https://bz.apache.org/bugzilla/show_bug.cgi?id=60724

I don't have time currently to track it down, but if we end up using SUBTOTAL for work, I'll need to fix it eventually.

WoozyG commented 7 years ago

I've committed the supporting code to POI/trunk. It will be available in the next nightly, future betas, and eventually 3.16-FINAL. Unfortunately there was a build infrastructure update yesterday that broke the official POI nightly builds (incompatible Ant update). Once we resolve that, the next nightly build will have this code if anyone wants to help test it and offer suggestions.

sblommers commented 7 years ago

Nice Woozy !! Since you seem to have some leverage in the group, completely unrelated bug 56822 (https://bz.apache.org/bugzilla/show_bug.cgi?format=multiple&id=56822) is still open for quite some time now. I use COUNTIFS patch successfully with many many sheets (thousands) with Vaadin Spreadsheet ( I also added a tiny unit test that shows the bug). Is it possible to bump the group and perhaps take it/fix into 3.16? Current Countifs function is completely incorrect and it is quite easy to fix. Sorry for hijacking, just say no and I'll keep patching poi ;-)

Nice job!

WoozyG commented 7 years ago

I've confirmed this updated class code, along with the latest POI nightly build, work as-is with:

Vaadin 8 Charts 4 Spreadsheet 2

Spreadsheet 2.0 didn't change the class interface or expected side effects, so it works as a drop-in replacement.

If you need a pull request, I can fork Vaadin-Spreadsheet and do that, but it is a stand-alone class and easy enough to copy/paste from here if you want.

sblommers commented 7 years ago

Thanks Greg, I can also confirm. This looks very good. And thumbs up for wrapping up 56822

WoozyG commented 7 years ago

POI 3.16 is now published, and includes the changes used by the above proposed replacement class implementation.

Let's have a discussion :)

I'd love to see this in a future Vaadin build, both to fix bugs and improve performance and feature coverage.

I think the POI evaluator functionality will enable Vaadin Spreadsheet to support range-based conditional formats too - could even do gradients, mapping the Excel definitions to CSS, and use standard Vaadin icons for icon based range conditions. Allowing those icons to be customizable would be a nice bonus.

ianscriven commented 7 years ago

Hi Greg, I've tried using your ConditionalFormatterPOIDelegate listed above, but it's not working - cells that have conditional formatting rules show "#VALUE" when the rule gets triggered. Is there an updated version? I'm using POI 3.16 and the latest Vaadin Spreadsheet (2.0-SNAPSHOT, built from source)

edit: tracked the issue down to a NPE in getCellFormattingIndex(Cell) if rule.getRule().getBorderFormatting() is null

WoozyG commented 7 years ago

I haven't tried with the latest snapshot, I'm still running against the 2.0 release for now, as I'm finalizing a product build. I do plan to work on this in the next month or so to combine it with #529 to add support for table styles and better theme color integration.

WoozyG commented 7 years ago

595 is my PR for this work now done against master, but requiring POI 3.17-beta1.