wizards-of-lua / rembulan

Rembulan, an implementation of Lua 5.3 for the Java Virtual Machine
Apache License 2.0
9 stars 2 forks source link

Cannot replace dashes with `gsub` #9

Closed fehnomenal closed 5 years ago

fehnomenal commented 5 years ago
local s = "string-with-dashes"
s:gsub("-", "_")

failes with a

Caused by: java.lang.IllegalArgumentException: error at character 1: unexpected character '-'
    at net.sandius.rembulan.lib.StringPattern$PatternBuilder.parseError(StringPattern.java:941)
    at net.sandius.rembulan.lib.StringPattern$PatternBuilder.CC_lit(StringPattern.java:1087)
    at net.sandius.rembulan.lib.StringPattern$PatternBuilder.cclass(StringPattern.java:1157)
    at net.sandius.rembulan.lib.StringPattern$PatternBuilder.PI_cc(StringPattern.java:1163)
    at net.sandius.rembulan.lib.StringPattern$PatternBuilder.PI(StringPattern.java:1186)
    at net.sandius.rembulan.lib.StringPattern$PatternBuilder.parse(StringPattern.java:1218)
    at net.sandius.rembulan.lib.StringPattern$PatternBuilder.access$900(StringPattern.java:913)
    at net.sandius.rembulan.lib.StringPattern.fromString(StringPattern.java:1228)
    at net.sandius.rembulan.lib.StringPattern.fromString(StringPattern.java:1232)
    at net.sandius.rembulan.lib.StringLib$GSub.invoke(StringLib.java:1476)
    at net.sandius.rembulan.lib.AbstractLibFunction.invoke(AbstractLibFunction.java:39)
    at net.sandius.rembulan.runtime.AbstractFunctionAnyArg.invoke(AbstractFunctionAnyArg.java:41)
    at net.sandius.rembulan.runtime.Dispatch.mt_invoke(Dispatch.java:76)
    at net.sandius.rembulan.runtime.Dispatch.call(Dispatch.java:275)

Although the magic characters should be escaped, the demo allows them to appear in isolation.

Adrodoc commented 5 years ago

According to the Lua 5.3 specification the string - is not a pattern. Therefore the behaviour of your code is unspecified. You should escape it as %-.

A character class is used to represent a set of characters. The following combinations are allowed in describing a character class:

  • x: (where x is not one of the magic characters ^$()%.[]*+-?) represents the character x itself. ...
  • %x: (where x is any non-alphanumeric character) represents the character x. This is the standard way to escape the magic characters. Any non-alphanumeric character (including all punctuation characters, even the non-magical) can be preceded by a '%' when used to represent itself in a pattern.
fehnomenal commented 5 years ago

Alright, no worries. It just seemed strange to me that implementations allow single magic chars and do the expected thing.

Adrodoc commented 5 years ago

Well unspecified means the implementation is free to choose a behaviour. That of course includes the behaviour you expect. You just can't be sure that your code will work across implementations/versions.