nim-lang / Nim

Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
https://nim-lang.org
Other
16.6k stars 1.47k forks source link

strutils.rfind start parameter is unecessarily unusual #11430

Closed c-blake closed 5 years ago

c-blake commented 5 years ago

The current Nim stdlib rfind uses start to refer to the high index of the slice searched backwards (the "start of search activity" not the "start of data being searched"), and does not support any low index search constraint - it always searches to index 0. In the forward (find) cases, both meanings (activity&data) of "start" coincide. So, there's no issue.

The reverse cases, well..It's unusual to say the least. C++ takes only a lower index bound/fence post. They call it pos: http://www.cplusplus.com/reference/string/string/rfind/ . Meanwhile Python takes both slice bounds and calls one start and the other end. (except for end, same names, same basic operations, different meaning..at odds with (simplified) tables like this https://github.com/nim-lang/Nim/issues/4218). C++/Python cover a lot of programmers.

Impact. I just did a search of every single package in nimble (or at least 833 of them!). None uses that third rfind parameter except https://github.com/Yardanico/nimpylib which incorrectly assumes start means the same thing in strutils as in Python (and he also passes a last parameter that doesn't exist - so clearly that's "never used code").

4 of those nimble packages even define their own proc called rfind which also take a "start-ish-word=start of data convention".

In short, I feel like there are strong "programming in the large" expectations about what extra parameters to rfind do, when they exist. Nim's rfind seems just different enough now to cause headaches porting code and changing it will probably bug no one.

The Nim strutils.rfinds are also limited in that you have to slice the string to do a bounded below search, but not bounded above.

Example

import strutils
let path = "/a/b/c/foo.tar.gz"
let slash = path.rfind('/')       # gives 6
echo path.rfind('.', start=slash) # gives -1 not 14
echo slash + path[slash..^1].rfind('.')  # 14 as expected

Possible Solutions

Since from my search of nimble packages no one uses the third argument (well, 1 incorrect usage), you can probably call it whatever you want or even..

1) Delete the parameter. Make people slice the string if they want to bound the search. It's probably not quite as fast, but at least doesn't create any knee-jerk, false expectations. Not all Nim code is nimble, obviously. Not one use there that would be impacted though strongly suggests deletion would not cause much trouble.

2) Be like every other string library I could find and add the other index bound/fencepost for good measure and better similarity with strutils.find which all take a start and last:

proc rfind*(s: string, sub: char, start=0, last=0): int =
  result = -1
  let last = if last==0: s.high else: last
  for i in countdown(last, start):
    if sub == s[i]: return i

or if you want to change the name "just in case":

proc rfind*(s: string, sub: char, lowIndex=0, highIndex=0): int =

The set[char] and string rfind would need similar easy fixes.

3) Leave as-is but change the name of the parameter to be more suggestive/more different from forward find:

proc rfind*(s: string, sub: char, highIndex=-1): int =

or if you are strangely attatched to "activity speak":

proc rfind*(s: string, sub: char, loopBeginsAt=-1): int =

4) Really hammer the reader over the head in the doc comment as to what is going on since it is very likely to not be what they expect at a quick glance.

Personally, I lean toward 2 to minimize effort porting code from other languages. 1) is only first for textual flow/being the least work.

Araq commented 5 years ago

Meh, I guess we should do what the other languages do.

c-blake commented 5 years ago

Ok. I'll do a PR today.

Quick question - we could actually maintain positional parameter passing backward compatibility if we make the signature the reverse of the start/last order of strutils.find:

proc rfind*(s: string, sub: char, last = -1, start: Natural = 0): int

Then it's just renaming start->last and adding a new start. Like I said, though, use of the parameter at all is very rare. So, that aspect of backward compatibility is of limited utility. It also (to me) seems incoherent with being "like strutils.find" which would suggest:

proc rfind*(s: string, sub: char, start: Natural = 0, last = -1): int

It's debatable whether the reverse slice parameter order acts as a "usefully redundant visual cue" to indicate the "reverseness of it all" or if it seems more like a "booby trap" from moving back/forth between slice notation and parameter passing notation.

I'd go with (start, last), myself and put in sharply worded note in changelog.md for the 0.1% of users who might have been using it outside of nimble.