janet-lang / janet

A dynamic language and bytecode vm
https://janet-lang.org
MIT License
3.45k stars 223 forks source link

`array/remove`: update final array index to be -1 #1224

Closed primo-ppcg closed 1 year ago

primo-ppcg commented 1 year ago

Related issue: #1219

The problem with array index -1: Because array/slice does not include its end index in the result, it is not possible to slice to the end of the array. Also, because array/insert inserts before the given index, it is not possible to insert to the end of the array.

There are two possible solutions to this problem:

  1. (Current Solution) Redefine the final index of an array to be -2.
  2. (Proposed Solution) Redefine the behavior of array/slice and array/insert for negative indices.

Notably, the behavior of these two functions is identical for either solution.

I submit for consideration that the proposed solution is more correct and desirable. When indexing from the end of the array, the array should be viewed reversed, as if its end were its front. In other words, any action that would have occurred before the index should occur after the index, and vice versa.

Furthermore, the current solution breaks the symmetry between array/insert and array/remove:

Current solution:

(def a @[1 2 3 4 5])
(array/insert a -1 0) # -> @[1 2 3 4 5 0]
(array/remove a -1)   # -> @[1 2 3 4 5 0]

(def b @[1 2 3 4 5])
(array/insert b -2 0) # -> @[1 2 3 4 0 5]
(array/remove b -2)   # -> @[1 2 3 4 0]

Proposed solution:

(def a @[1 2 3 4 5])
(array/insert a -1 0) # -> @[1 2 3 4 5 0]
(array/remove a -1)   # -> @[1 2 3 4 5]

(def b @[1 2 3 4 5])
(array/insert b -2 0) # -> @[1 2 3 4 0 5]
(array/remove b -2)   # -> @[1 2 3 4 5]

Inserting at an index, and then immediately removing at the same index should certainly be a no-op.

zevv commented 1 year ago

@narimiran: say what you want, but the Janet folks are pretty reasonable at times, eh? :)

narimiran commented 1 year ago

@narimiran: say what you want, but the Janet folks are pretty reasonable at times, eh? :)

Hey there, long time no see :) Believe it or not: when I saw your name in the contributors' list, that's what finally convinced me I should give Janet a try.


Hopefully this change doesn't break too much code in the wild, but IMO it is well worth changing this.

sogaiu commented 1 year ago

While I don't personally mind rewriting my own code (if I have anything that is affected), if there is rewritten code, is there not an issue of how to handle running with older versions of Janet vs newer versions?

I suppose one might provide multiple versions of one's program that are tailored to different versions of Janet. Alternatively, one might also have code that examines janet/version and acts accordingly.

Another option might be to avoid using negative indeces completely for a while.

zevv commented 1 year ago

Hey there, long time no see :) Believe it or not: when I saw your name in the contributors' list, that's what finally convinced me I should give Janet a try.

Yeah, as if I have a good taste with these kind of things. The last language I invested in ended up a total dumpster fire, remember.

zevv commented 1 year ago

Another option might be to avoid using negative indeces completely for a while.

I don't hate this. I expect this feature is not used often, as it seems there were no complaints about it earler, so my guess is that this would not affect too many code bases. I know I'm a bit of a whiner myself, and I would have spoken out if I would have needed to use -2 for the last index.

I agree that -2 is just a bit too weird, and I'd rather see this "fixed" sooner than later.

sogaiu commented 1 year ago

I'm fuzzy on whether plain slice would be affected.

I use -2 and other negative indeces in some of my pegs with slice, like here and here.

I don't disagree about it seeming weird, but we still haven't heard from the creator :)

narimiran commented 1 year ago

I use -2 and other negative indeces in some of my pegs with slice, like here and here.

But this will not have to be changed. You can still use (slice $& -4 -2) and it will do the same thing as before, capturing the same two elements. The only thing changed in slice is its documentation: when using negative indices, the range is now explained as open-closed, making the index of the last element -1 (not closed-open, as before, with the last element as -2).

Before:

 [10 20 30 40 50]
  -6 -5 -4 -3 -2
       [     ]

After:

 [10 20 30 40 50]
  -5 -4 -3 -2 -1
       [     ]  
bakpakin commented 1 year ago

The inability to remove stuff from the end of the array and the lack of symmetry is definitely a problem of a bad interface, I support this change, at least for array/insert.

As far as making slices using -1 as the end of the array instead of the -2 issues, I agree that it is confusing and unfortunate (along with os/date changing indices to be 0-indexed), but I also believe that constantly changing interfaces that really don't have strong mathematical reasons behind and are more about "hygine" or "clean code" just causes churn.

On the other hand, if one were to make this change, sooner is always better than later to just "rip the bandaid off."

tionis commented 1 year ago

Perhaps such improvements could be collected and bundled together into one large backwards-incompatible patch?

CosmicToast commented 1 year ago

I would recommend having a "2.0" branch for janet where all such changes can be accumulated until it's ready for release. It should be possible to rebase it on every point release, meaning people could "opt-in" early as well.

bakpakin commented 1 year ago

I would recommend having a "2.0" branch for janet where all such changes can be accumulated until it's ready for release. It should be possible to rebase it on every point release, meaning people could "opt-in" early as well.

Not a bad idea, but I think the more interesting question is how much should we consider changing in the first place. Lots of small changes could be made, but there are also larger changes that might be overall benefical but cause even more churn.

My gut tells me that changing slice semantics will break a lot more code that array/insert and array/remove, since for the usecase of inserting at the end of the array, array/push and array/pop is more natural. The indexing scheme where -1 refers to imaginary "element after the last one" also affects buffers and strings as well.

primo-ppcg commented 1 year ago

My gut tells me that changing slice semantics will break a lot more code that array/insert and array/remove, since for the usecase of inserting at the end of the array, array/push and array/pop is more natural. The indexing scheme where -1 refers to imaginary "element after the last one" also affects buffers and strings as well.

I think there may be a bit of confusion. I haven't proposed changing the behaviors of array/slice or array/insert. I believe that they were already correct, but perhaps for the wrong reason. The current behavior has been explained differently, in terms of -1 as the final array index.

As an example, the following code snippets should be equivalent:

(var a @[:a :b :c :d])
(set a (reverse a))
(array/insert a 0 :e)
(set a (reverse a))
(var a @[:a :b :c :d])
(array/insert a -1 :e)

And they are, currently. However the reason given is that -1 is an index "outside" the array, and array/insert inserts before that index. What I'm proposing is that this is not as correct as saying that negative indices view the array in reverse, such that inserting at a negative index would actually insert after that index, precisely as the first snippet does. I have likewise changed the description of array/slice with that understanding.

The only function that I have proposed changing is array/remove. Had array/insert and array/remove been consistent, I would not have recommended a change, despite the final index of -2 being somewhat weird, and I do believe unprecedented.

bakpakin commented 1 year ago

Negative indexing is a weird python-ism that can seem ubiquitous in scripting languages, but it comes with it's own can of worms, and a lot of languages just don't bother. I don't know if it comes from Python originally but it really is more of a hack than any sort of elegant construction, regardless of the particulars.

Also, the proposed explanation of array/insert and array/remove actually makes less sense to me now. After a call to array/insert, the array is one longer, the same negative index would return a different value. In that sense, the current implementation is more consistent.

EDIT: not to say that change is bad, just to say that the particular argument I don't think is very convincing.

nstgc commented 1 year ago

I would like to point out that -1 mod n = n-1, making it the last element of an array. Like like, on a ring, -2 is the second to last, and so on. So, from a mathematical standpoint, at least to me, it makes sense. Also, adding my two cents to something related, one of my issues with 1-indexed arrays is how they interact with modular arithmetic. (Looking at you Julia!)

bakpakin commented 1 year ago

Getting slightly off topic here, I did a little experiment with redoing negative indexing for slices here: https://github.com/janet-lang/janet/compare/master...negative-indexing-redo

It basically breaks almost all reasonably sized projects I can find written in Janet (not surprisingly).

primo-ppcg commented 1 year ago

I did a little experiment with redoing negative indexing for slices here: master...negative-indexing-redo

Of the functions that refer to negative indexing, which includes:

array/insert
array/remove
array/slice
buffer/blit
buffer/slice
keyword/slice
slice
string/slice
symbol/slice
tuple/slice

the implementations of all of these are already correct and do not need to be changed, with exception to array/remove. And surely if they hadn't been correct, it would have been mentioned long before now. This PR is Ready to Merge; no further changes are required. If it hadn't been, I wouldn't have marked it as such.

Also, the proposed explanation of array/insert and array/remove actually makes less sense to me now.

I would like you to imagine a bijective one-dimensional coordinate space translation, xx', such that 0-1, 1-2, 2-3, etc., and that parallel functions are defined which reference array indices according to this translation. When using these functions, arrays would be indexed in the following way:

#x' -1 -2 -3 -4 -5
 @[ :a :b :c :d :e ]

When inserting an item at index -2, the item would be inserted at index -2, shifting the contents and everything after by one cell. The position of -2 will not have changed. If the contents at -2 are examined, it would be precisely what was inserted. Removing it again would result in the original array.

When indexing from the end of an array, this same bijection is used, except that the items of the array are enumerated in reverse order:

#    0  1  2  3  4
 @[ :a :b :c :d :e ]

#   -1 -2 -3 -4 -5
 @[ :e :d :c :b :a ]

such that inserting at -2 would insert between :e and :d, such that slicing from -2 to -4 would contain :d and :c, but not :b, and such that removing at -2 would remove :d.

Note that when referencing a range of cells, as slicing does, or a gap between cells, as inserting does, the behavior is precisely the same as always viewing the array from the front, and pretending that the final index is -2. Janet already has a correct implementation for these; nothing needs to change.

It only makes a difference when referring to a cell location directly, which currently only applies to array/remove, although fetching from a negative index for indexed or bytes types could theoretically be allowed.

narimiran commented 1 year ago

Getting slightly off topic here, I did a little experiment with redoing negative indexing for slices here: https://github.com/janet-lang/janet/compare/master...negative-indexing-redo

You changed things that don't need change and shouldn't be changed, i.e. you've changed their behavior and introduced several off-by-one bugs.

bakpakin commented 1 year ago

@primo-ppcg This makes sense.

Just to be clear, I will not redo negative indexing for slicing, I just wanted to have a more concrete understanding of how much it was actually used

sogaiu commented 1 year ago

So may be the following code in mendoza is affected?

(defn relative-url
  "Express a url relative to (dyn :url). The given url should
  be absolute from the site root, like /index.html."
  [url]
  (def b-parts (string/split "/" (dyn :url)))
  (array/remove b-parts -2)                      # the form on this line
  (def a-parts (string/split "/" url))
  (while (and (not= 0 (length a-parts))
              (not= 0 (length b-parts))
              (= (a-parts 0) (b-parts 0)))
    (array/remove a-parts 0)
    (array/remove b-parts 0))
  (string (string/repeat "../" (length b-parts)) (string/join a-parts "/")))

May be this can be re-expressed in a way to not use a negative index?

I guess array/pop might do.


I'm posting this here rather than in the mendoza repository issues because I think more interested parties (issue participants) will see it.