spreadsheetlab / XLParser

A C# parser for Microsoft Excel formulas with a 99.9% compatibility rate
Other
408 stars 91 forks source link

Failed parsing named range with ? #40

Closed aivaloglou closed 8 years ago

aivaloglou commented 8 years ago

Formula:

SUM(XDO_?Amount?)

where XDO_?Amount? is a named range

dhoepelman commented 8 years ago

Spec:

name = name-start-character [ name-characters ]
name-start-character = underscore / backslash / letter / name-base-character
name-character = name-start-character / decimal-digit / full-stop / questionmark
name-base-character = (any code points which are characters as defined by the Unicode character properties, [UNICODE5.1] chapter 4 ; MUST NOT be 0x0-0x7F)
name-characters= 1*name-character
name-character = name-start-character / decimal-digit / full-stop / questionmark
;A name MUST NOT have any of the following forms:
;TRUE or FALSE
;cell-reference
;function-list
;command-list
;future-function-list
;R1C1-cell-reference

Our current:

    private const string NamedRangeRegex = @"[A-Za-z\\_][\w\.]*";

Seems to have 2 problems:

  1. Unicode characters x80 and up can be used as the start of names
  2. Question marks are allowed
  3. Underscores and backspaces are allowed everywhere

This should fix them (except for 1 maybe fully):

// Start with a letter or underscore, continue with word character (letters, numbers and underscore), dot or question mark 
private const string NamedRangeRegex = @"[\p{L}\\_][\w\\_\.\?]*";