Seq(1).indexOfSlice(Nil, -2)==-2

noresttherein commented 1 year ago

Reproduction steps

Scala version: 2.13.10

Expected: 0, same as in Vector.

What's worse, Vector(1).indexOfSlice(List(1), -2) == -2, too (substitute any Seq for Vector).

SethTisue commented 1 year ago

@scala/collections

som-snytt commented 1 year ago

That is actually consistent with the precise or precious Scaladoc.

the first index >= from such that the elements of this sequence starting at this index match the elements of sequence that, or -1 if no such subsequence exists.

som-snytt commented 1 year ago

The PR trivially adjusts the malicious from. It would be nice if we had some sort of tool to verify that the collections satisfy a set of "laws" of some kind. In lieu of that, I augmented the unit test to exercise both paths (sizes known and unknown).

noresttherein commented 1 year ago

The PR trivially adjusts the malicious from. It would be nice if we had some sort of tool to verify that the collections satisfy a set of "laws" of some kind. In lieu of that, I augmented the unit test to exercise both paths (sizes known and unknown).

Will it work for Seq(1).indexOfSlice(Nil, 1), which currently returns -1 (but, correctly, 1 for Vector)?

som-snytt commented 1 year ago

@noresttherein see the PR comment, I went the other way, but I'm not religious about it.

SethTisue commented 1 year ago

It would be nice if we had some sort of tool to verify that the collections satisfy a set of "laws" of some kind

we do, scala-collection-laws: https://github.com/scala/scala-collection-laws

though the fact that they aren't in-repo leads to neglect

som-snytt commented 1 year ago

Thanks, Seth, for not calling out my sarcasm. I have a fork and will try again to make headway with it. The sarcasm was really my choice between sarcasm and self-deprecation, much like choosing whether to allow indexOfSlice past the last index, there is a kind of arbitrary equivalence that merely shifts the burden of awareness.

noresttherein commented 1 year ago

I think it is still wrong: Seq(1).indexOfSlice(Nil, 1) should probably return 1. It does for List.

     if (from0 > l) -1 //non strict
     else if (tl < 1) from0

som-snytt commented 1 year ago

List is the default Seq. Did you mean Vector currently returns 1? My comment was that I changed that to work the same as List, but I'm open to whatever the community consensus is. I spent a few minutes with collections-laws, which I now refer to as claws; whatever is decided should be encoded/enshrined there.

scala> Seq(1).indexOfSlice(Nil, 1)
val res0: Int = -1

scala> List(1).indexOfSlice(Nil, 1)
val res1: Int = -1

Edit: an argument in favor of returning the last index plus one is that it is consistent with patch, which is permissive. Also lastIndexOfSlice and startsWith.

I thought indexOfSlice should return legal indices (like indexOf), something I can xs(i), but that's only true for existing elements. So probably I'll flip it to your way of thinking and see how it flies.

Welcome to Scala 2.13.10 (OpenJDK 64-Bit Server VM, Java 19).
Type in expressions for evaluation. Or try :help.

scala> val xs = (1 to 10).toList
val xs: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

scala> xs.indexOfSlice(Nil, 10)
val res0: Int = -1

scala> xs.patch(10, List(42), 100)
val res1: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 42)

scala> xs.indices
val res2: scala.collection.immutable.Range = Range 0 until 10

scala> xs.lastIndexOfSlice(Nil, 10)
val res3: Int = 10

scala> xs.startsWith(Nil, 10)
val res4: Boolean = true

sjrd commented 1 year ago

I expect xs.indexOfSlice(Nil, i) to return i for 0 <= i <= xs.size. In particular, definitely i and not -1 when i == xs.size.

More generally, I guess a good "law" would be that xs.indexOfSlice(ys, from) returns some idx >= 0 iff idx is the smallest value such that idx >= from and xs.slice(idx, idx + ys.size) == ys (and -1 if no such idx exists).

lrytz commented 1 year ago

I find it confusing for an indexOfSomething method to return an invalid positive index, but there can be arguments either way. Ultimately it's a matter of agreeing on a specification. Related to https://github.com/scala/bug/issues/12795#issuecomment-1567878466.

sjrd commented 1 year ago

But for slice, xs.size is not an invalid index. xs.slice(xs.size, xs.size) is perfectly valid. (and making it invalid for the sake of it is actively harmful in practice).

Another way to look at it is that indices never point to elements. Only ranges of indices point to elements. When we say apply(i), we are truly asking for the one element in the range [i, i+1). And that is invalid because i+1 is invalid when i == xs.size. But the range [i, i) is a valid empty range as long as 0 <= i <= xs.size.

lrytz commented 1 year ago

One could argue it's one thing for an operation to accept indices outside the defined range, if it can produce a valid result, like xs.slice(xs.size, xs.size) (or xs.slice(xs.size + N, xs.size +N) for that matter - is there anything special about i == xs.size?), but it's another thing for an operation to return an index that's outside the defined range.

But I see your point, Nil.indexOfSlice(Nil) is an example that can probably convince me, we'd have to make that reuturn -1.

sjrd commented 1 year ago

It's common practice in other languages to accept ranges that stop right at the length of the collection, but not go further. Java and C++ have very consistent APIs like that, for example.

som-snytt commented 1 year ago

I am persuaded by sjrd's argument about open ranges.

Because I am weak-minded, I am also persuaded by lrytz's question about what makes size + 1 different from size.

My answer to that is:

scala> val xs = 1.to(10).toList
val xs: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

scala> xs.indexOfSlice(Nil, 10)
val res0: Int = 10

scala> xs.indexOfSlice(Nil, 11)
val res1: Int = -1

scala> xs.lastIndexOfSlice(Nil)
val res2: Int = 10

scala> xs.slice(11, 11)
val res3: List[Int] = List()

The permissive inputs at res3 do not imply that res1 should be 11, because res1 must be consistent with res2.

Another permissive input, xs.patch(11, List(42), 100), is likewise not relevant.

As a footnote, there is not a method for xs.slice(5, 6).headOption but there is xs.lift(5).

noresttherein commented 1 year ago

I think that it is one thing is to be lax about the arguments you accept, and quite another to be lax about the values you return. At least it seemed to me like something universally accepted as the best practice (doesn't mean it's 'right', just that it seems most people would agree). On the other hand, size as indexOfSlice(Nil) feels ok, because it still 'borders the element': in Scala, and all C-derived languages, the 'end' or 'until' index is always exclusive: no matter how restrictive we would want to be, we have to allow it as an end of a range. And so, [size,size) is a valid subrange of [0, size); [size + 1, size +1) isn't. Anyway, indices, pointers and iterators don't really 'point' to elements, they point between them - at least that way of thinking makes sense when you do pointer arithmetic (including iterator.drop).

The problem is, Scala is very inconsistent with it. Especially mutable vs immutable collections: all mutable methods (at least as I recollect) are not permissive, for example Buffer.remove (and you could expect that the same logic applies to it as with slice/drop). Array.copyOf(a, -1) throws a NegativeArraySizeException. copyToArray cannot even decide one way or the other. On the other hand, if patch is permissive, why not updated? Index falls out of range, so you ignore it; a valid result can be produced. It is not, because probably it felt like a bad idea (and I agree). Now, as I stated before, I am not super confident that permissive indices are a good idea in the first place (most languages I had contact with were strict, especially statically typed), or that the way they are implemented cough patch cough is the best, but it's not a surprising decision. However, I do not report these issues simply because I disagree, but because I was truly surprised, despite knowing Scala quite well.

And I haven't even brought up yet how disturbing to me is the complete disregard towards consistency in naming of the arguments ;)

som-snytt commented 1 year ago

@noresttherein I think the PR follows what you say.

I can't find my comment, but somewhere I distinguished the mutating API as requiring an "existing element", i.e., a good index. That explains why a mutator treats indices differently (from sane immutable API, of course).

I also conjecture that patch should be called patched and patchInPlace should be patch, but the other question is whether they may behave differently (with respect to inputs) based on that distinction.

"I changed patch to patchInPlace and now it throws."

scala / bug

Seq(1).indexOfSlice(Nil, -2)==-2 #12780

Reproduction steps