janet-lang / janet

A dynamic language and bytecode vm
https://janet-lang.org
MIT License
3.43k stars 221 forks source link

Should `reverse` on bytes return a string? #1248

Closed primo-ppcg closed 1 year ago

primo-ppcg commented 1 year ago

I'm looking mainly at the asymmetry between these two forms:

> (reverse! @"abc")
@"cba"
> (reverse @"abc")
@[99 98 97]

Perhaps it would be more appropriate for reverse on a bytes to return a string? As precedent, slice also distinguishes between indexed and bytes similarly: https://github.com/janet-lang/janet/blob/3a4d56afca2eee0e4dbca9a6400757d198e6d6f3/src/core/corelib.c#L412-L427

> (slice "abc" 1)
"bc"
> (slice [1 2 3] 1)
(2 3)
sogaiu commented 1 year ago

As a factor for consideration, please note the existence of string/reverse:

$ janet
Janet 1.30.0-ecc4d80a linux/x64/gcc - '(doc)' for help
repl:1:> (string/reverse @"hello")
"olleh"
repl:2:> (string/reverse "hello")
"olleh"
repl:3:> (string/reverse :hello)
"olleh"
repl:4:> (string/reverse 'hello)
"olleh"
primo-ppcg commented 1 year ago

As a factor for consideration, please note the existence of string/reverse:

string/slice does as well, yet slice on bytes still does what you'd expect. I don't know that it makes a big difference, I just thought I'd mention it.

Another thing I find a bit inelegant is the asymmetry between these:

Python style:

sort
sorted

Ruby style:

reverse!
reverse

If it were up to me, I would prefer sort! with sort rather than reverse with reversed. I think it's also dangerous to fall into the mindset that no mistakes can be ever be corrected, because it may affect existing code. It's far better to make such changes when there are 3 thousand users, rather than 3 million.

As mentioned by others previously, perhaps such changes could be collected and aggregated into a single major release.

CosmicToast commented 1 year ago

I think I've mentioned this before, but probably a Janet 2.0.0 would be desirable eventually. That would allow a redesign of the stdlib (which has many more inconsistencies than just the above naming), and potentially introduce some features that are currently rather unfortunate (such as jpm being unable to perform executable builds optimally, major version compatibility checks, etc).

sogaiu commented 1 year ago

For reference, some past discussion: https://github.com/janet-lang/janet/issues/626

I think collecting candidate things could be useful for more than one reason. One being that it might give a better sense of the size / impact.


Partly in jest, I recently realized that Janet's versioning is much more similar to Lua's than other things :)

primo-ppcg commented 1 year ago

For reference, some past discussion: #626

Thanks for that, a lot of very sensible opinions. I've also concluded that no action is necessary on that point. What functions do is more relevant than what they're called.

sogaiu commented 1 year ago

What functions do is more relevant than what they're called.

I agree [1].

Perhaps we're better off starting with not-so-nice names and only once the functions / callables have proven their worth, good names / aliases can be assigned :P


[1] On a tangential topic of naming, IMO, the continued use of very misleading names can have cumulative undesirable effects (like "oxidation" and "reduction" in chemistry).

CosmicToast commented 1 year ago

What functions do is more relevant than what they're called.

I actually disagree, these are equally important. A function that is poorly named is a function that doesn't exist.

Very often, users will come in, ask for stuff (sometimes even propose VM changes), only to have their use-case instantly solved by a function they've never heard of. It isn't feasible to read and retain the entirety of the standard library documentation and definitions (and examples, since those are often necessary to really understand what the function does). As such, function names are how people actually realize that what they're trying to do is an option. Function names are the very first line of documentation, and documentation is a cornerstone of any API, let alone a PL.

Having consistent behaviors, semantics, and having all of those align with the naming scheme has additional benefits beyond this. Let's say there's a ruby-style foo and foo! with the latter being in-place, and only accepting mutable data structures for the latter.

Imagine, for instance, an immutable variant of put is added and called assoc, to comply with historical naming. The user would need to know this history, or to have read and grasped the meaning of the assoc documentation and examples to use it, rather than doing something like |(freeze (put (thaw $) $1 $2)). In the meanwhile, if the above naming scheme is in place, and instead these are named put! and put, the user can instantly infer the above. It also increases documentation locality - they will appear next to one another in automatically generated docs (which otherwise need to be supplemented by handwritten docs[^1].


To clarify a bit further: while naming does matter, what matters isn't "the perfect name", but the consistency in the naming scheme. If naming didn't matter at all, it would be fine to simply name functions as UUIDs, after all. On the other hand, it's unreasonable to expect to get something as complicated as that right "the first time". As long as the naming system is consistent, stylistically and functionally (as per the above example, naming is also functional), it's fine.

[^1]: I may consider making these for Janet 2 regardless. Hopefully, my prior work (e.g in the alpine user handbook) qualifies me.

primo-ppcg commented 1 year ago

Very often, users will come in, ask for stuff (sometimes even propose VM changes), only to have their use-case instantly solved by a function they've never heard of.

I think that's a valid point. My very first issue was requesting the behavior of or to be changed to match that of any?. It's not that I hadn't seen it, but I had assumed the trailing question mark meant it would always return a boolean. Perhaps a discussion or wiki could be opened regarding proposed breaking changes, that could later be formalized.

sogaiu commented 1 year ago

Looks like there is now a related discussion item.


Interesting that although the discussion item mentioned this issue, the discussion item doesn't automatically get shown here (like issues or prs seem to).

bakpakin commented 1 year ago

I think it is a possible change - it may be worth returning a string if the input to the function is (bytes? x). But a buffer would actually be more consistent - the return value is mutable in the case of arrays and tuples.

Another strange behavior as a result of the reverse implementation is calling reverse on dictionaries. This doesn't error but does result in strange arrays full of holes in many cases:

repl:1:> (reverse {1 2 3 4 5 6})
@[nil 2 nil]
primo-ppcg commented 1 year ago

Another strange behavior as a result of the reverse implementation is calling reverse on dictionaries.

I went back to confirm that this behavior is indeed present in 1.29.1 and 1.30.0 :sweat_smile: I agree that if no error is produced, it should at least do something reasonable.

But a buffer would actually be more consistent - the return value is mutable in the case of arrays and tuples.

Makes sense to me. I believe this is the minimum required change:

@@ /src/boot/boot.janet, 1449 @@
   (var n (length t))
-  (def ret (array/new-filled n))
-  (forv i 0 n
-    (put ret i (in t (-- n))))
+  (def ret (if (bytes? t)
+             (buffer/new-filled n)
+             (array/new-filled n)))
+  (each v t
+    (put ret (-- n) v))
   ret)
> (reverse [:a :b :c])
@[:c :b :a]
> (reverse "abc")
@"cba"
> (reverse {1 :a 2 :b 3 :c})
@[:c :b :a]