humio / issues

Issue Tracker for Humio
4 stars 2 forks source link

Breaking change: non-ASCII (smart) quotes are reserved characters #108

Closed ahe-humio closed 4 years ago

ahe-humio commented 4 years ago

TL;DR: Reject non-ASCII (smart) quotes when used in unquoted strings (as free-text search or field names). Also, do not interpret non-ASCII quotes as quotes at all. These quote characters can be used in quoted strings without further escaping.

Contrast this to #4056.

Before the parser would handle smart quotes in a surprising way, as shown in these examples:

Query Interpretation
“foo” OR “bar” "foo” OR “bar"
“foo” OR "bar" "foo” OR " ERROR

Note that the examples use slightly different quote characters:

, the Unicode character "left double quotation mark" (U+201C), and ", the ASCII character "quotation mark" (U+0022).

The intent was to allow non-ASCII quotes as quoted strings, however, we didn't get it right.

After this change, the parser will reject the examples above, and we continue working on this to ensure that our error messages are helpful, and that the UI help users change their queries to use ASCII quotes.

Origin of non-ASCII Quotes

Based on when we see the non-ASCII quotes in our logs, it appears that they came from our own examples. That was a bug in how examples were rendered and that problem has been fixed. However, the problem can also arise if a user copies a query from the Humio UI to one of many common programs, such as Microsoft Word, non-programmer editors on macOS, Google Docs, and more.

Non-ASCII Quotes in Use

We have identified the following non-ASCII quotes are likely to cause problems for users that copy queries between Humio and programs such as mentioned above.

The quote characters indicated with (*) were not previously recognized as quoted strings, however, they're all found in the system setting that controls smart quotes on macOS.

The quote character indicated with (**) were not previously recognized as quoted strings, however, it has been included due to its name and proximity to U+201E which are indicators that it may be used for smart quotes on some systems.

Alternatives

We have considered accepting all these quotes as being equivalent to the ASCII quote character, but that can also lead to surprising behavior. For example, consider if a user of Humio is a developer of Yoyodyne App, and has noticed a problem with how that app handles smart quotes. They may use Humio to search for something like "“" and get an error.

Another issue is what quote characters should match up. For example, consider that both »abc« and «abc» are used in different parts of the world. Another issue is that many people find it hard to tell , the Unicode character "left double quotation mark" (U+201C), and ", the ASCII character "quotation mark" (U+0022) apart.