Open zah opened 6 years ago
Is it not possible for you to instead use the UniqueId
directly as the key for a Table, or to store it in a set?
In my particular case, the UniqueId
value is rather large and I don't want to have multiple copies of it.
I think C++ solved this by using perfect forwarding. Which allows duck typing to operate at the level of the minimum necessary set of procedures. (hash and ==) if you have ==
(alternative, key) for example, perfect forwarding is enough. So that's: proc contains[T, U](s: HashSet[T], key: U): bool
I have more supporting cases for this proposal:
In our codebase, it's quite common to create a table using a seq
or a string
as key and then to create large number of intermediate procs that work with openarrays
until they get to the point of querying the table.
At the moment, you have to pay the high price of converting the openarray
back to a sequence just to query the table, but with the suggested functionality here, it would be possible to say that the keyOf
for sequences and strings is defined as the openarray
obtained from them.
Seems a rather contrived use case
We are heavy users of openarray
and for good reasons. Keys are sometimes embedded in network packets (which we can slice), in serialized on-disk formats (again, supporting slicing) and sometimes by just treating the bytes of a cryptographic hash result as a key. The example is not that contrived at all.
The "perfect forwarding" alternative suggested above could also work, but it has some downsides, such as the fact that you have to define operators such as ComplexObject == UniqueId
, which may not be always appropriate.
Please explain why you can't use Table[UniqueId, X]
instead.
I've already explained that.
My original problem was that UniqueId
was a rather large value and I didn't want to have multiple copies of it laying around in memory (one stored inside the table key slot and one stored inside the object).
But even if I accept the copies, the second problem I brought today is still there. openarrays
and seq
and string
keys are not compatible and you must allocate a copy of the key just so you can make a query.
My proposal solves both problems in a simple way.
No you didn't explain it well enough. If you have a mapping from UniqueId
to ComplexObject
there is no need to also have UniqueId part of ComplexObject and that's why this problem comes up rarely. And just fyi, I actually like your proposal quite a bit, but it needs to be justified well.
The pattern of storing an object identifier as a field inside a larger object is very frequent. In many problem domains, entities carrying IDs are passed around to functions that need to access the ID (in order to sent it over the network, or store it in a file format, etc). There are separate less frequently used lookup tables that are used to locate objects.
Think about it, isn't the compiler itself full of such entities? We are just lucky to use cheap numeric IDs that reduce the cost of the problem. In my particular original case, the size of the ID was exceeding the size of the additional payload and there was a very large number of entities turning this into a real practical problem.
Alright, assuming a keyOf
template, why can't keyOf
be used to eliminate these:
proc hash*(o: ComplexObject): int = hash(o.id)
proc `==`(lhs, rhs: ComplexObject): bool = lhs.id == rhs.id
After all, these are directly derived from keyOf
.
Well, that's exactly the point. I've mentioned the definitions above only in the description of the current situation and its problems. keyOf
will eliminate them.
Aha, ok, please make it happen.
OK, How should we mark accepted RFCs btw? Can we have a tag for them like ready-for-implementation
?
I think there are a lot of people in the community who would enjoy making a contribution to Nim, but may be anxious to submit a pull request that might be rejected. And there are quite a lot of RFCs that are good, but appear stalled, so having this tag and using it may drive more contributions.
I propose to add the labels Rejected
and Accepted
.
I still don't understand this proposal :/
But in any case, I'll create an RFC: Accepted
label. Rejections can be signalled by closing the issue.
@dom96
Rejections can be signalled by closing the issue.
I think RFC: Rejected
is useful to filter existing RFCs (will need to review Issues and assign this label).
You can easily filter on closed issues, @data-man
A common scenario this proposal doesn't take into account is that one might want to have different lookup strategies per table instance - in C++ this is dealt with by passing the comparison / hash operators to the table at construction time - this is very common when working with complex data types that might be indexed in multiple ways - in one module I may want to index by guid and in another by something else. This is generally not a decision tied to the type, but rather to a specific table that indexes a specific type.
boost::multi_index expands on this idea further.
Consider the existence of types like the following:
Instances of such types are often added to sets (or they can be used as keys in tables). To achieve this, you just need to implement hashing and equality comparison in the following way:
But this leaves one problem. When you search the set or the table for an existing key, you must allocate and construct a full instance of
ComplexObject
which may be expensive or non-trivial. It would be nice if one is able to search the set directly for an object with a specificid
without allocating any memory.A possible solution:
1. Add a new mixed-in proc called
keyOf
that would be implemented as identity by default, but would allow the user to provide the following override:The implementation in
tables.nim
andsets.nim
will then use this call prior to hashing and checking the equality of the elements in the hash table.2. Change the signature of procs such as
get
,containts
, etc, to the following:.. or just provide alternative names for the lookups using the alternative key type.