Closed neffets closed 8 years ago
Steffen, will look into this later this week when I look at merging your pull request. Thank you.
Steffen, I've now had a chance to look at this issue and, unfortunately, the fix will not be as simple as applying word boundary matches to the regex as this causes existing test case failures.
I can think of several additional cases where not applying the dimension glyph may come in useful and these are all in encodings...
I'll experiment some more with ways of possibly matching these before the dimension glyph is applied and report back later.
Thanks for a wonderful library! We use it extensively!
This issue affects us too with precisely the circumstances described by neffets. Maybe a better solution would be to allow the user to choose what glpyhs to replace:
add a method to manipulate the $symbols member
->replaceGlyph('dimension', true | false) ??
check if the glpyh needs to be replaced
if (!empty($this->symbols['dimension'])) { // Dimension sign $this->glyph_search[] = '/([0-9]+[])]?[\'"]? ?)[xX]( ?[[(]?)(?=[+-]?'.$cur.'[0-9]*.?[0-9]+)/'. $this->regex_snippets['mod']; $this->glyph_replace[] = '$1'.$this->symbols['dimension'].'$2';
}
Thanks for the feedback @lauri-elevant. Could you and @neffets please try out the issue-158 branch and let me know if this solves the issue for you? The new branch allows you to solve the problem by callingsetStrictDimensionSymbolParsing(true)
before parsing your text.
The new strict dimension parsing mode will only perform dimension sign replacements if there are spaces around the x or X in the source text. In other words, 1 x 2
will lead to a replacement (as there are spaces before and after the x) but 1x2
, 1 x2
and 1x 2
will not trigger a replacement. This should allow ISBNs, hexadecimal notation and other encodings with embedded 'x' chars to go through unaltered.
Hope that helps!
No. Now in branch issue-158 with setStrictDimensionSymbolParsing(true) it replaces nothing.
here is my text example
DE000A13R6X4 -> the X should remain an uppercase letter 34x25 12X2 -> here its allowed to become twice an dimension sign x |DE000A13R6X4|in tables too| not allowed | | 34X45 |Text 12x3 | here both are allowed | | 34XAnton | Text 34XAnton | not allowed |
It does not replace the patterns " 34X45 " or " 12x3 "
@neffets, please try out the issue-158-take-2 branch. You shouldn't need to do any special setup this time.
Yep, the branch "issue-158-take-2" is working OK.
Merged into master in ed249d3. Thanks for the report.
7X2ULJWT3TI ..still is converted to: 7×2ULJWT3TI
(DE000A13R6X4 is okay)
Would it maybe be still better to be able to turn off certain glyph substitutions?
Its fine to parse "4X5" to "4×5".
But its a bug for parsing codes like "DE000A13R6X4" to "DE000A13R6×4".
Please use stricter parsing like "\b\d+[xX]\d+\b" or "\b\d+[xX]\d+\b