jmespath-community / jmespath.spec

JMESPath Specification
6 stars 3 forks source link

JEP-015 String Slices #103

Closed springcomp closed 1 year ago

springcomp commented 1 year ago

String Slices

JEP 15
Author Maxime Labelle
Created 24-July-2022
SemVer MINOR
Status Draft
[Discussion #26] #26

Abstract

The original JEP 5 introduced slice-expression in the grammar to slice specific portions of an array. While the syntax was specifically designed to operate on strings, there is no actual limitation in the grammar to extend this behaviour to string.

This JEP introduces changes to allow slice-expression to operate on string types and act like a more powerful substring() function.

Motivation

String manipulation functions are a frequently requested feature to be added to JMESPath. While introducing a whole host of string manipulation functions will certainly be proposed at some point, slicing strings is an easy extension to JMESPath that does not require any grammar change and is fully backwards compatible.

Slices

_This section outline word changes to the Slices documentation of the grammar in bold._

slice-expression = [number] ":" [number] [ ":" [number] ]

A slice expression allows you to select a subset of an array or string. A slice has a start, stop, and step value. The general form of a slice is [start:stop:step], but each component is optional and can be omitted.

Slices in JMESPath have the same semantics as python slices. If you're familiar with python slices, you're familiar with JMESPath slices.

Given a start, stop, and step value, the sub elements in an array or characters in a string are extracted as follows:

Slice expressions adhere to the following rules:

Examples

Slicing operates on a strings exactly as if a string were thought of as an array of characters.

Compliance tests

A new string_slices.json file will be added to the compliance test suite.

gibson042 commented 1 year ago

Given a start, stop, and step value, the sub elements in an array or characters in a string are extracted as follows

What is the definition of "character"? Grammar and following sections imply that a string is a sequence of code points, a definition which is affirmed by the description of length/reverse/sort/etc. (even though the in-page jmespath.js implementation of those functions incorrectly treats each supplementary plane code point [i.e., U+10000 through U+10FFFF] as if it were a sequence of two surrogate code points [i.e., a code point from U+D800 through U+DBFF followed by a code point from U+DC00 through U+DFFF]—https://github.com/jmespath-community/jmespath.test/issues/2 ).

springcomp commented 1 year ago

@gibson042 see my answer.