qt4cg / qtspecs

QT4 specifications
https://qt4cg.org/
Other
28 stars 15 forks source link

[XPath] Functions symmetric to `head()` and `tail()` for sequences and arrays #97

Closed dnovatchev closed 1 year ago

dnovatchev commented 2 years ago

In Xpath 3.1 we already have head(), tail(), and last()

But there is no function that produces the subsequence of all items of a sequence except the last one. There exists such a function in other programming languages. For example, in Haskell this is the init function.

And the last() function isn't the symmetric opposite of head() -- it doesn't give us the last item in a sequence, just its position. So we need another function: fn:heel() for this.

fn:init($sequence as item()*) as item()*

fn:heel($sequence as item()*) as item()?

init($seq) is a convenient shorthand for subsequence($seq, 1, count($seq) -1)

heel($seq) is a convenient shorthand for slice($seq, -1)

Examples:

fn:init(('a', 'b', 'c')) returns 'a', 'b'

fn:init(('a', 'b')) returns 'a'

fn:init('a') returns ()

fn:init(()) returns ()

fn:heel('a', 'b', 'c') returns 'c'

('a', 'b', 'c') => init() => heel() returns 'b'

It makes sense to have fn:init() and fn:heel() defined on arrays, too.

array:init($array as array(*)) as array(*)

array:heel($array as array(*)) as item()*

Examples:

array:init([1, 2, 3, 4, 5]) returns [1, 2, 3, 4]

array:init([1]) returns []

array:heel([1, 2, 3, (4, 5)]) returns (4, 5)

array:heel([()]) returns () (the empty sequence)

array:init([]) produces error

array:heel([]) produces error

[1, 2, 3, (4, 5)] =>array:heel() => heel() returns 5

I would challenge anyone to re-write the last example in understandable way using fn:slice() 💯

ChristianGruen commented 2 years ago

+1. I proposed this function a while ago (I cannot find out where, though, apart from some remarks on Slack), and I remember that @michaelhkay preferred to name it fn:truncate.

ChristianGruen commented 2 years ago

PS: Analogous to fn:head and other sequence functions, fn:init(()) should return ().

ChristianGruen commented 2 years ago

And PPS (sorry, I’ll be more disciplined next time again) fn:truncate still seems to be mentioned in an example in the current draft of the spec: https://qt4cg.org/branch/master/xpath-functions-40/Overview-diff.html#func-range-to.

I agree with Dimitre, and I’d love to see fn:init (or fn:truncate) readded, as well as e.g. fn:foot to return the last item.

dnovatchev commented 2 years ago

I agree completely with Christian, and indeed, we need a fn:foot, because the existing fn:last doesn't do the symmetric of fn:head, as it is supposed to in other languages, such as Haskell.

dnovatchev commented 2 years ago

I agree completely with Christian, and indeed, we need a fn:foot, because the existing fn:last doesn't do the symmetric of fn:head, as it is supposed to in other languages, such as Haskell.

But the name foot is... E w e . . .

My preferred names are: final, rear or ending.

ChristianGruen commented 2 years ago

Related (quoting Michael Kay, https://app.slack.com/client/T011VK9115Z/C011NLXE4DU/thread/C011NLXE4DU-1605590306.224400):

I proposed foot($s) to get the last item in a sequence, but I'm now leaning towards slice($s, -1) as that packs a lot more power into one function. The second argument can be any sequence of integers, with negative integers counting from the end, so for example slice($s, -$n to -1) gives you the last $n.

dnovatchev commented 2 years ago

Related (quoting Michael Kay, https://app.slack.com/client/T011VK9115Z/C011NLXE4DU/thread/C011NLXE4DU-1605590306.224400):

I proposed foot($s) to get the last item in a sequence, but I'm now leaning towards slice($s, -1) as that packs a lot more power into one function. The second argument can be any sequence of integers, with negative integers counting from the end, so for example slice($s, -$n to -1) gives you the last $n.

. . . Still, what matters is convenience, brevity and being easy to understand. Thus it would be good to have a function with intuitive name for that.

fn:rear($sequence as item()*) as item()

Is a shorthand for: slice($sequence, -1)

Examples:

('a', 'b', 'c') => rear() produces 'c'

It makes sense to have fn:init() and fn:rear() defined on arrays, too.

array:init($array as array(*)) as array(*)

array:rear($array as array(*)) as item()*

Examples:

array:init([1, 2, 3, 4, 5]) returns [1, 2, 3, 4]

array:rear([1, 2, 3, (4, 5)]) returns (4, 5)

[1, 2, 3, (4, 5)] =>array:rear() => rear() returns 5

I would challenge anyone to re-write the last example in understandable way using fn:slice() 💯

ChristianGruen commented 2 years ago

My impression is that fn:rear and fn:butt are too similar ;) Maybe it should be up to native speakers to make the decision. I’d also prefer a 4-letter-term.

I agree that the syntax of fn:slice is not that intuitive (but it's definitely powerful).

dnovatchev commented 2 years ago

My impression is that fn:rear and fn:butt are too similar ;)

:) A little humor isn't bad

Maybe it should be up to native speakers to make the decision. I’d also prefer a 4-letter-term.

I agree that the syntax of fn:slice is not that intuitive (but it's definitely powerful).

Yes, and I updated the previous comment with examples of rear() on arrays, that are understandable, but would be difficult to assimilate if slice() were used

dnovatchev commented 2 years ago

I updated the issue to include definition for fn:rear() and the array equivalents: array:init() and array:rear()

adamretter commented 2 years ago

Is fn:rear($seq) just a single function for fn:head(fn:reverse($seq)) ?

martin-honnen commented 2 years ago

I understand the proposals but fn:init(()) produces error astonishes me, that (an error) wouldn't happen with subsequence, or would it?

ChristianGruen commented 2 years ago

True. It's only the array functions that should trigger errors (and actually I never understood why array operations and functions were designed to raise range errors).

michaelhkay commented 2 years ago

I'm afraid there's always been tension between the error and no-error philosophies. Personally I tend to the "raise an error" camp - errors are easier to debug than wrong answers, and with XPath the hardest thing of all to debug is the expression that returns an empty sequence and you can't work out why. But the worst problem when you design by committee is that it's very hard to achieve a consistent policy on such things.

ChristianGruen commented 2 years ago

My experience is mostly that it's confusing that sequence and array lookups behave differently. Maps are (fortunately?) similar to sequences: You won’t get errors if you request the value of a non-existing key. As a consequence, it's often easier to refactor existing sequence-based code into map-based code, although arrays would be the more natural choice.

But, yes, it's difficult to evolve a language that has been very lax in the beginning, and I’m glad that the semantics of XQuery are stricter than the ones of XPath 1.0.

ChristianGruen commented 2 years ago

Is fn:rear($seq) just a single function for fn:head(fn:reverse($seq)) ?

@adamretter Exactly

joewiz commented 2 years ago

As an alternative to the name fn:foot or fn:rear, how about fn:toe? This matches the common phrase, "head to toe," https://en.wiktionary.org/wiki/head_to_toe, and the 1987 Lisa Lisa and Cult Jam classic.

ChristianGruen commented 2 years ago

This matches the common phrase, "head to toe,"

Interesting! In German, it’s »von Kopf bis Fuß« (from head to foot).

dnovatchev commented 2 years ago

This matches the common phrase, "head to toe,"

I thought about "toe". But one has many toes, and just one head. Even for "foot", one has two feet, but just one head.

Certainly, if there is such a saying, people may accept a function name as "toe" naturally. This is for the native English speakers to decide.

michaelhkay commented 2 years ago

But then, we have to rename the parent axis, because as well as having two feet, we also have two parents.

gimsieke commented 2 years ago

head and tail obviously derive from horizontal animals, not from vertical humans, so the last item in a sequence must be something like the tail tip or a cow’s switch. Switch according to Merriam-Webster: “a tuft of long hairs at the end of the tail of an animal (such as a cow).” But the term “switch” is already taken, and it doesn’t meet Christian’s four-letter requirement. “Tuft” meets this requirement, but no-one associates “tuft” with “last item in a sequence.” Naming things…

dnovatchev commented 2 years ago

image

gimsieke commented 2 years ago

Are you advocating in favor of any of the terms presented in the image, @dnovatchev?

It just came to me that even though it doesn’t meet Christian’s 4-letter criterion, end() is a function name that is not taken yet by anything else in our “namespace.”

And a string-length() of 3 is only off by one, an excusable error that may happen to any of us.

So end() is what I’m cautiously proposing if we really need to have a function that gives the last item of a sequence. (I don’t think we need such a function urgently though.)

dnovatchev commented 2 years ago

Are you advocating in favor of any of the terms presented in the image, @dnovatchev?

@gimsieke, Gerrit, I am not a native speaker of English, but final and furthest seem best to me.

gimsieke commented 2 years ago

Ok, final() is also off by one, so it’s a valid contender (say I in German word order as another non-native English speaker)

dnovatchev commented 2 years ago

@gimsieke If "foreign" words were allowed, the french: _apriori and dernier also seem fine 😊

gimsieke commented 2 years ago

In German, we can make the distinction by letzte() (≘ tail()), letztes() (≘ last item of a sequence), allerletztes() (≘ really last item of a sequence), allerallerletztes() (≘ really really last item of a sequence), vorletztes() (≘ penultimate item of a sequence), übernächstes() (≘ next after next item of a sequence, or rather, as an übernächst:: axis), etc.

dnovatchev commented 2 years ago

This matches the common phrase, "head to toe,"

I thought about "toe". But one has many toes, and just one head. Even for "foot", one has two feet, but just one head.

Certainly, if there is such a saying, people may accept a function name as "toe" naturally. This is for the native English speakers to decide.

I think I found the definitive correct name for this function.

The heel is the rearmost part of the foot, which is the farthest from the head

Thus:

fn:heel($sequence as item()*) as item()

So, head() and heel() , that's it! And both start with an H and this is easier to remember.

martin-honnen commented 2 years ago

But it should be fn:heel($sequence as item()*) as item()?, to indicate a single item or the empty sequence (if the argument is the empty sequence) is returned.

dnovatchev commented 2 years ago

But it should be fn:heel($sequence as item()*) as item()?, to indicate a single item or the empty sequence (if the argument is the empty sequence) is returned.

@martin-honnen I actually think that in this case an error should be raised.

ChristianGruen commented 2 years ago
fn:heel($sequence as item()*) as item()

Nice one! I agree with Martin that the function shouldn't raise an error (fn: head doesn’t either).

dnovatchev commented 2 years ago
fn:heel($sequence as item()*) as item()

Nice one! I agree with Martin that the function shouldn't raise an error (fn: head doesn’t either).

OK, Under the pressure I updated the initial proposal not to produce error.

However, as @michaelhkay agreed, producing an error when the arguments have not meaningful values for the function is the right design decision.

There are examples when new versions of other programming languages have demonstrated responsibility by reverting previous erroneous decisions even though this affected compatibility (in a limited way).

I am wondering, can we take the conscientious responsibility to do the right thing?

ChristianGruen commented 2 years ago

Note that array:heel must raise an error if we want to be consistent (see my comment further above).

martin-honnen commented 2 years ago

I am not sure having fn:head or fn:heel throw an error if the argument is empty is the right thing. First of all, in XPath itself we have no try/catch way, we only have that in host languages like XQuery or XSLT.

And it would break the way people write recursion using head/tail, for instance, if head would give an error for the empty sequence, the examples https://www.w3.org/TR/xpath-functions/#highest-lowest using e.g. fn:fold-left(fn:tail($seq), fn:head($seq), function($highestSoFar as item()*, $this as item()*) as item()* { or fn:fold-left(fn:tail($seq), fn:head($seq), function($highestSoFar as item()*, $this as item()*) as item()* { would break, it seems.

That is just one example but it even appears in the (admittedly explanatory/optional) part of the functions and operators spec.

michaelhkay commented 2 years ago

Consistency should be the guiding rule here. The function should be symmetric with fn:head(). No errors. It also follows the principle in many standard functions that if the argument is an empty sequence, the result is an empty sequence.

There's another principle here, which is that a function should be designed to work over the largest domain for which it is meaningful; errors should be reserved for cases where there's no conceivable use case for supplying a particular argument value. It would have made sense for subsequence($x, 3.14159) to be an error because there's no way anyone would deliberately do that to achieve the specified effect. But having head($x) (or tail($x)) return empty when $x is empty is useful and reasonable: it makes a head-tail recursive functions easier to express.

dnovatchev commented 2 years ago

Note that array:heel must raise an error if we want to be consistent (see my comment further above).

Thanks @ChristianGruen. Updated the signature of array:heel() and provided examples when both array:heel() and array:init() throw an error.

dnovatchev commented 2 years ago

But having head($x) (or tail($x)) return empty when $x is empty is useful and reasonable: it makes a head-tail recursive functions easier to express.

@michaelhkay, Obviously the people (and editor 😊) who put array:head() and array:tail() in the XPath 3.1 F&O Specification thought differently ?

ChristianGruen commented 2 years ago

One last remark: array:heel() may indeed return zero or several items. Both the return type and the error handling should be identical to array:head() (https://www.w3.org/TR/xpath-functions-31/#func-array-head).

dnovatchev commented 2 years ago

One last remark: array:heel() may indeed return zero or several items. Both the return type and the error handling should be identical to array:head() (https://www.w3.org/TR/xpath-functions-31/#func-array-head).

Done!

Also provided examples for this case.

ChristianGruen commented 2 years ago

In https://github.com/qt4cg/qtspecs/issues/80#issuecomment-1253466179, some more use cases are given for fn:init and fn:foot/fn:heel.

PieterLamers commented 2 years ago

I don't know if there is still discussion about whether to use foot or heel. In my personal experience, foot is more generically refering to something at the bottom (cf. table foot, page footer, etc) whereas heel refers to a human body part. So if I were to vote, I'd vote in favor of fn:foot and against fn:heel. BTW some Dutch inspiration: we say "aan de voet van de berg" meaning the bottom end of the mountain. The Dutch counterpart of from Head to Toe is Van top tot teen: From top to toe.

michaelhkay commented 2 years ago

foot(), I think, will be readily accepted as referring to the last item in a sequence. heel() is a joke.

For selecting all items in a sequence except the last, I would vote for truncate().

As regards the difference in error handling between arrays and sequences when a subscript is out of range, I think we have to live with it, and new functions should remain consistent. There are advantages and disadvantages to both designs, and it's a classic example of design-by-committee that different solutions have been adopted for the two cases. It's very difficult to change now for compatibility reasons.

dnovatchev commented 2 years ago

For selecting all items in a sequence except the last, I would vote for truncate()

truncate() is a verb, but all other names are nouns.

So I would prefer either the function name already in use in Haskell: init(), or, if foot() is accepted, then why not follow up with another joke: footless() ?

michaelhkay commented 2 years ago

Yes, truncate() is a different part of speech, but we already have a heady mix of nouns, verbs, and adjectives for very similar operations: filter, subsequence, insert, remove, empty, exists. We even have "for-each" - I have no idea what part of speech that is. Until we had tail(), using remove($x, 1) was a common way of extracting all items except the first. So I don't think using a verb is a problem.

Looking at init() I would assume it referred to the initial item in a sequence; the word "initial" usually means the first, not everything except the last. Only a tiny minority of our users will be familiar with a Haskell function of the same name.

ndw commented 2 years ago

I take your point about trucate() being a verb, but init() is just too generally applied to mean "initialize" and we've largely stayed away from abbreviations. I suppose you could persuade me to accept initial-items() or all-but-the-last(). Maybe.

dnovatchev commented 2 years ago

I take your point about trucate() being a verb, but init() is just too generally applied to mean "initialize" and we've largely stayed away from abbreviations. I suppose you could persuade me to accept initial-items() or all-but-the-last(). Maybe.

That's an excellent idea Norm:

sans-last()

dnovatchev commented 2 years ago

Yes, truncate() is a different part of speech, but we already have a heady mix of nouns, verbs, and adjectives for very similar operations: filter, subsequence, insert, remove, empty, exists. We even have "for-each" - I have no idea what part of speech that is. Until we had tail(), using remove($x, 1) was a common way of extracting all items except the first. So I don't think using a verb is a problem.

Looking at init() I would assume it referred to the initial item in a sequence; the word "initial" usually means the first, not everything except the last. Only a tiny minority of our users will be familiar with a Haskell function of the same name.

Still, all other 3 names used here head , tail, foot are non-verbs. Let us try to have them all non-verbs,

Maybe truncated() ?

ChristianGruen commented 2 years ago

foot and truncate sounds good and intuitive.

As far as I know, Haskell is the only language that uses init, but I never found out why they chose that name.

dnovatchev commented 2 years ago

foot and truncate sounds good and intuitive.

As far as I know, Haskell is the only language that uses init, but I never found out why they chose that name.

I am against truncate() . It is not intuitive, to me truncate means "perform the action of truncating", not "get the thing that is truncated".

So, if this should have anything closer to "truncate", I definitely would prefer "truncated()" or "truncation()", or even "starting-segment()" or "left-segment()"

ChristianGruen commented 2 years ago

I am against truncate() . It is not intuitive, to me truncate means "perform the action of truncating", not "get the thing that is truncated".

Isn't that similar with fn:remove, and all other functions with verbs included?

If we wanted, we could check if other programming languages provide an ever better term.