tlaplus / CommunityModules

TLA+ snippets, operators, and modules contributed and curated by the TLA+ community
MIT License
273 stars 36 forks source link

Problem using ReplaceFirstSubSeq and ReplaceAllSubSeqs with strings #57

Closed afonsonf closed 2 years ago

afonsonf commented 2 years ago

Hello, the replace operator with strings is not correctly replacing the following chars: "\n", "\t", "\f", and"\r".

For example, in the following assume expressions, the first two are true and the last four false.

ASSUME(ReplaceAllSubSeqs("\\", "%%", "Properly escape the %% quotes") = "Properly escape the \\ quotes")
ASSUME(ReplaceAllSubSeqs("\"", "%%", "Properly escape the %% quotes") = "Properly escape the \" quotes")
ASSUME(ReplaceAllSubSeqs("\n", "%%", "Properly escape the %% quotes") = "Properly escape the \n quotes")
ASSUME(ReplaceAllSubSeqs("\t", "%%", "Properly escape the %% quotes") = "Properly escape the \t quotes")
ASSUME(ReplaceAllSubSeqs("\f", "%%", "Properly escape the %% quotes") = "Properly escape the \f quotes")
ASSUME(ReplaceAllSubSeqs("\r", "%%", "Properly escape the %% quotes") = "Properly escape the \r quotes")

Some options that i found that can fix it are:

  1. Not using unquote and not using Pattern.quote.

Example: ((StringValue) t).toUnquotedString(); -> ((StringValue) t).getVal().toString(); st.replaceFirst(Pattern.quote(ss), sr) -> st.replaceAll(ss, sr) Problem: Slash and dollar sign have to be special escaped, for example the first assume expression would have to change to: ASSUME(ReplaceAllSubSeqs("\\\\", "%%", "Properly escape the %% quotes") = "Properly escape the \\ quotes") However, the remain ones would work as they are.

  1. Change to use functions without regex.

For replaceAll there is replace. For replaceFirst I did not find one in the standard library, however there is replaceOnce in org.apache.commons.lang3.StringUtils that can be used. There is also the option to create a custom one using find, substr and replace.

I can work on a pull request for this issue, what would be the best approach to modify ReplaceFirstSubSeq and ReplaceAllSubSeqs ?

REF

https://github.com/tlaplus/CommunityModules/blob/master/modules/tlc2/overrides/SequencesExt.java#L243 https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence) https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replaceAll(java.lang.String,%20java.lang.String) https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replaceFirst(java.lang.String,%20java.lang.String) https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#replaceOnce-java.lang.String-java.lang.String-java.lang.String-

lemmy commented 2 years ago

Does org.apache.commons.lang3.StringUtils address all issues out of the box?

afonsonf commented 2 years ago

Yes