modularml / mojo

The Mojo Programming Language
https://docs.modular.com/mojo
Other
22.09k stars 2.54k forks source link

[Feature Request] Use Tuple[K, V] instead of DictEntry[K, V] #2554

Open gabrieldemarmiesse opened 2 months ago

gabrieldemarmiesse commented 2 months ago

Review Mojo's priorities

What is your request?

As title

What is your motivation for this change?

DictEntry was there because Tuple only worked with AnyRegType, now that Tuple works with AnyType, there is no need for DictEntry anymore. We can just return Tuple when working with Dict.items()

Reference:

Any other details?

No response

VMois commented 1 month ago

I would like to work on this feature.

jayzhan211 commented 1 month ago

@gabrieldemarmiesse If we replace with Tuple[K, V], how do we deal with the key hash?

@value
struct DictEntry[K: KeyElement, V: CollectionElement](CollectionElement):
    """Store a key-value pair entry inside a dictionary.

    Parameters:
        K: The key type of the dict. Must be Hashable+EqualityComparable.
        V: The value type of the dict.
    """

    var hash: Int
    """`key.__hash__()`, stored so hashing isn't re-computed during dict lookup."""
    var key: K
    """The unique key for the entry."""
    var value: V
    """The value associated with the key."""

    fn __init__(inout self, owned key: K, owned value: V):
        """Create an entry from a key and value, computing the hash.

        Args:
            key: The key of the entry.
            value: The value of the entry.
        """
        self.hash = hash(key)
        self.key = key^
        self.value = value^
VMois commented 1 month ago

@gabrieldemarmiesse If we replace with Tuple[K, V], how do we deal with the key hash?

@value
struct DictEntry[K: KeyElement, V: CollectionElement](CollectionElement):
    """Store a key-value pair entry inside a dictionary.

    Parameters:
        K: The key type of the dict. Must be Hashable+EqualityComparable.
        V: The value type of the dict.
    """

    var hash: Int
    """`key.__hash__()`, stored so hashing isn't re-computed during dict lookup."""
    var key: K
    """The unique key for the entry."""
    var value: V
    """The value associated with the key."""

    fn __init__(inout self, owned key: K, owned value: V):
        """Create an entry from a key and value, computing the hash.

        Args:
            key: The key of the entry.
            value: The value of the entry.
        """
        self.hash = hash(key)
        self.key = key^
        self.value = value^

We can make DictEntry an internal structure and only return Tuple[K, V] to the user.

VMois commented 1 month ago

@jayzhan211 are you working on this issue? If yes, please, continue. If not, I will start working on it.

jayzhan211 commented 1 month ago

@jayzhan211 are you working on this issue? If yes, please, continue. If not, I will start working on it.

not working on it.

VMois commented 1 month ago

I started working on it. A few questions:

  1. Should Tuple return key and value directly (Tuple[K, V]) or references to them (Tuple[Reference[K], Reference[V]])? I am a bit worried that if Tuple[K, V] is used, unnecessary object copies will be made.

  2. Do we have a way to benchmark a Dict? I would like to test Dict performance before and after I made changes in case of any regressions.

Thank you.

gabrieldemarmiesse commented 1 month ago

Thanks for working on this! 1) Prefer using references instead of copies. That will allow the iterator to work with non-copyable types. 2) we don't have benchmarks yet, but if you avoid copying the data around and use references in the tuple, this should be pretty fast. We can always make it faster later on.

The Modular staff is currently working on a benchmarking infrastructure. It's not ready yet. But in general avoiding copies is always a good way to have good performance out of the box.

VMois commented 1 month ago

I have a very unusual issue.

I tried Tuple[K, V], and the tests pass, but when I try Tuple[Reference[K,...], Reference[V, ...]], it throws a weird error during the test run (nothing when building the library).

The error is here:

/Users/vmois/Projects/mojo_stuff/mojo/stdlib/src/collections/dict.mojo:628:29: error: symbol use argument #1 has type 

'!kgen.pointer<struct<(!kgen.pack<[[pointer<T>, {"__moveinit__" : (!kgen.pointer<pointer<T>> init_self, !kgen.pointer<pointer<T>> owned_in_mem) -> !kgen.none = @"stdlib::memory::reference::Reference::__moveinit__(stdlib::memory::reference::Reference[$0, $1, $2, $3]=&,stdlib::memory::reference::Reference[$0, $1, $2, $3])_thunk"<:type [!kgen.paramref<T>, {"__del__" : (!kgen.pointer<T> owned_in_mem) -> !kgen.none = get_type_method(T, "__del__")}], :i1 0, :struct<()> *"self`2x", 0>, "__del__" : (!kgen.pointer<pointer<T>> owned_in_mem) -> !kgen.none = @"stdlib::memory::reference::Reference::__del__(stdlib::memory::reference::Reference[$0, $1, $2, $3])_thunk"<:type [!kgen.paramref<T>, {"__del__" : (!kgen.pointer<T> owned_in_mem) -> !kgen.none = get_type_method(T, "__del__")}], :i1 0, :struct<()> *"self`2x", 0>}], [pointer<U>, {"__moveinit__" : (!kgen.pointer<pointer<U>> init_self, !kgen.pointer<pointer<U>> owned_in_mem) -> !kgen.none = @"stdlib::memory::reference::Reference::__moveinit__(stdlib::memory::reference::Reference[$0, $1, $2, $3]=&,stdlib::memory::reference::Reference[$0, $1, $2, $3])_thunk"<:type [!kgen.paramref<U>, {"__del__" : (!kgen.pointer<U> owned_in_mem) -> !kgen.none = get_type_method(U, "__del__")}], :i1 0, :struct<()> *"self`2x", 0>, "__del__" : (!kgen.pointer<pointer<U>> owned_in_mem) -> !kgen.none = @"stdlib::memory::reference::Reference::__del__(stdlib::memory::reference::Reference[$0, $1, $2, $3])_thunk"<:type [!kgen.paramref<U>, {"__del__" : (!kgen.pointer<U> owned_in_mem) -> !kgen.none = get_type_method(U, "__del__")}], :i1 0, :struct<()> *"self`2x", 0>}]]>) memoryOnly>>'

but @stdlib::collections::dict::_DictEntryToTupleIter::__next__(stdlib::collections::dict::_DictEntryToTupleIter[$0, $1, $2, $3]&) expected type 

'!kgen.pointer<struct<(!kgen.pack<[[pointer<T>, {"__moveinit__" : (!kgen.pointer<pointer<T>> init_self, !kgen.pointer<pointer<T>> owned_in_mem) -> !kgen.none = @"stdlib::memory::reference::Reference::__moveinit__(stdlib::memory::reference::Reference[$0, $1, $2, $3]=&,stdlib::memory::reference::Reference[$0, $1, $2, $3])_thunk"<:type [!kgen.paramref<T>, {"__del__" : (!kgen.pointer<T> owned_in_mem) -> !kgen.none = get_type_method(T, "__del__")}], :i1 0, :struct<()> {  }, 0>, "__del__" : (!kgen.pointer<pointer<T>> owned_in_mem) -> !kgen.none = @"stdlib::memory::reference::Reference::__del__(stdlib::memory::reference::Reference[$0, $1, $2, $3])_thunk"<:type [!kgen.paramref<T>, {"__del__" : (!kgen.pointer<T> owned_in_mem) -> !kgen.none = get_type_method(T, "__del__")}], :i1 0, :struct<()> {  }, 0>}], [pointer<U>, {"__moveinit__" : (!kgen.pointer<pointer<U>> init_self, !kgen.pointer<pointer<U>> owned_in_mem) -> !kgen.none = @"stdlib::memory::reference::Reference::__moveinit__(stdlib::memory::reference::Reference[$0, $1, $2, $3]=&,stdlib::memory::reference::Reference[$0, $1, $2, $3])_thunk"<:type [!kgen.paramref<U>, {"__del__" : (!kgen.pointer<U> owned_in_mem) -> !kgen.none = get_type_method(U, "__del__")}], :i1 0, :struct<()> *"self`2x", 0>, "__del__" : (!kgen.pointer<pointer<U>> owned_in_mem) -> !kgen.none = @"stdlib::memory::reference::Reference::__del__(stdlib::memory::reference::Reference[$0, $1, $2, $3])_thunk"<:type [!kgen.paramref<U>, {"__del__" : (!kgen.pointer<U> owned_in_mem) -> !kgen.none = get_type_method(U, "__del__")}], :i1 0, :struct<()> *"self`2x", 0>}]]>) memoryOnly>>'
        for kv in self.items()

And here is the code - https://github.com/VMois/mojo/pull/1/files (previous commits have Tuple[K, V] changes).

The only difference between the current and expected types is *"self'2x" (present in current) for pointer T but not in expected ({ }).

Could someone help? Maybe my approach is wrong. Thank you.

jayzhan211 commented 1 month ago

Should we replace DictEntry with Tuple in _DictEntryIter instead of introducing _DictEntryToTupleIter

Upd: I rethink about the issue here. Given that we cant remove DictEntry because we need keyhash, I think replacing DictEntry with Tuple has no benefit at all. _DictEntryIter already returns Reference[DictEntry], if we want to replace it with Tuple[Reference[K], Reference[V]] we then need to get the referenced key and value from DictEntry and combine them into Tuple, it is quite challenging without benefit.