Closed metagn closed 4 years ago
I like sequtils.count
. Another example is linkedlists
, but I would rather that collection simply offered a type that hid all the machinery and stored the size of the list. Still, count
should be a thing.
for js unicode there's https://nim-lang.org/docs/unicode.html#runeLen%2Cstring
Why do we need count
? If it's allowed to be expensive in O(N) terms generic code should avoid it anyway.
The point here is to make sure generic code doesn't use it. If we have a concept like Indexable
that uses []
and len
then it should warn or error when using it for a type like cstring. Since we can't just delete len
for cstring (also we need it for JS), we have to deprecate it on C backend, so a proc that uses Indexable
will create a warning for now. count
is merely a last resort in case you really need the length of something, you should be inclined to do let L = x.count
first.
Also, this doesn't close https://github.com/nim-lang/Nim/pull/14162, I miscommunicated, count
is separate from isEmpty
. @disruptek you can reopen if you want with Araq's suggestion
Well so len
for cstring
needs a deprecation period and instead count
but why do we need sequtils.count
for everything that defines items
?
More importantly, the premise is wrong: In practice length information in a generic context can be valuable and speed up computations even if len is O(N):
var result = newSeq[type(iter)](iter2.len) # ok, need to iterate over the C string once
var i = 0
for x in iter2: # ok, need to iterate over it once again, but now it's in the cache
result[i] = x
inc i
result
why do we need sequtils.count for everything that defines items
The point here is we separate everything that has len
vs everything that can have len
calculated. We could do:
template count(x: Indexable): int = x.len
when Indexable is implemented, or for now just compiles(x.len)
. If a proc, say, in strutils, needs the length of a parameter (rfind
), they would use count
, but everything else should check for len
for the most efficiency (if you believe cstring is efficient with len
then we can simply not deprecate it) then fall back to a generic solution that uses a cap or something.
I'll add a when compiles(x.len)
branch in my PR if you want but it's going to be messy like toSeq
is since it has to account for iterators.
As I said, the issue is unconvincing for today's computers. Preallocated sizes are often more important to have than avoiding O(N) traversals and if findIt
is bad for type T even though T has a len
, findIt
could be specialized for T
. Currently with our untyped
design that's unfeasible to do, so maybe we should put our focus on this "untyped it templates" issue instead.
The issue has less to do with the hardware and more to do with the programmers being able to communicate with one another as to the nature of the operation. You admit that O(n) won't always be avoided, so you can't also argue that count
shouldn't be exposed to users that need it. With count
, they know what they are getting when they need to ask for it.
For ensembles it can be card
(for cardinality) otherwise a slightly longer countItems
might convey more the O(n) nature?
In any case, trees, graphs and other linked data-structure might not store counts of a subgraph/subtree and a count
convention might be useful there.
Weird RFC, closing since there's no real point here
Originally discussed in https://github.com/nim-lang/Nim/pull/14162. Writing so many RFCs is taxing but this needs its own place to be discussed.
Problem:
The length of some data structures (lists, cstrings) are non-constant. This can cause some algorithms that use the length but don't need it, like generatingcstring
is a victim of improper use by templates/generic procs that check for alen
overload on a type.cstring
slen
is O(n) with n pointer dereferences which is needlessly complex for anything namedlen
.newSeq(x.len)
for efficiency instead of an empty seq and adding to it, to be needlessly slow.Proposal:
len
should be conventionally defined for procs that have constant/fast access to their length, and there should be a generic unarycount
(maybe go in sequtils? can debate name) that iterates over the type.len
for cstring should be deprecated outside of JS.This works on top of
sequtils.count
since that one takes 2 arguments, same withcountIt
. It's also in line with how other languages implement and apply meaning tocount
. An alternative implementation could be:After defining this original count, we can extend it to better support types that have
len
defined, like so:This implementation would break the use of
count
for iterators, but the point ofcount
is to write better algorithms for data structures, you can just usetoSeq
at that point.countIt
works otherwise.Flaws & backwards compatibility: The name
count
should be backwards compatible, other options includegetLen
,countLen
.Deprecating cstring.len is the crux of this issue.I have an idea as to how you can deprecate it for JS too. JS cstring actually counts unicode characters as 1 character (https://github.com/nim-lang/Nim/issues/10911), so we might use a new overload instead oflen
here, but I don't know what the name would be.If not, and JS cstring len should be kept, then that's also doable, but it would be hard to document. Declaration branching between different backends is never fun and when it's for a deprecated symbol that's even worseJS cstring is a topic for another day, not deprecatinglen
for cstrings is fine,c_strlen
should stay in system anywayThe main backwards compatibility issue is porting standard library procs to
count
, this would only happen when we start using concepts though and that's going to be backwards incompatible in of itself