[Feature Request] Add a deque struct to the stdlib

gabrieldemarmiesse commented 6 months ago

Review Mojo's priorities

[X] I have read the roadmap and priorities and I believe this request falls within the priorities.

What is your request?

Add a Deque struct to the stdlib, similar to python's deque: https://docs.python.org/3/library/collections.html#collections.deque

What is your motivation for this change?

While it's a very useful collection to have in general, in the context of the standard library it offers some new possibilities. There are multiple places in the stdlib where we want to "consume" objects in a list in order. Doing so currently means either 1) calling List.pop(0), while safe, this is very inneficient as all elements must be moved at every pop() 2) Using a pointer and move_pointee. While this is efficient, this is quite unsafe.

Moving a list pointer into a deque would allow us to call deque.popleft() at very little cost. Thus consuming the elements in order.

Any other details?

The two places where we currently have this issue in the stdlib are List.extends() and Dict.fromkeys()

bgreni commented 6 months ago

@gabrieldemarmiesse I was thinking about this a couple days ago so I'd like to take a crack at it if you haven't already started

gabrieldemarmiesse commented 6 months ago

Let's wait for the go of the maintainers before working on a new struct.

gryznar commented 6 months ago

To be aligned with current naming convention, it should be Deque, not deque ;)

dimitrilw commented 6 months ago

If you want a space to make a Deque struct before core Mojo maintainers give the green-light for std-lib, you are more than welcome to submit PRs to toybox. Honestly, I'm disappointed with myself for overlooking putting deque, ring-queue, and the like to my "hit list" of desired data structures.

JoeLoser commented 5 months ago

I'm +1 for a deque collection. Do you want to write up a one pager on an API design proposal for consideration and we can take a look to discuss before starting down an implementation? You may want to check out https://doc.rust-lang.org/std/collections/struct.VecDeque.html as something to consider too.

avitkauskas commented 1 month ago

I implemented mojo Deque for myself, but tried to make it properly documented and tested so that it could be a candidate for a mojo stdlib if approved. You are welcome to check it here if mojo development priorities allow that: https://github.com/avitkauskas/mojo-datastructs/blob/main/datastructs/deque.mojo

It's implemented on a growing and shrinking circular buffer via UnsafePointer. I implemented all the python deque API and a bit more. On the trivial benchmark of 4 billion appends/pops, it's about 1.5x faster than C++ stdlib implementation on the default construction and can be easily configured to minimize memory allocations if speed is more important than memory, providing 2.5x speed-up compared to C++.

-63.7% 2.76x faster 2507.1ms ± σ:   1.4ms, 2505.1ms…2510.4ms `./deque-mojo-no-shrink`
-33.3% 1.50x faster 4609.3ms ± σ:   6.0ms, 4603.2ms…4625.9ms `./deque-mojo`
              base  6915.2ms ± σ:   3.7ms, 6910.1ms…6922.2ms `./deque-cpp`

Copy/move/iter on non-trivial elements have yet to be properly tested and, most probably, refined. Please let me know if you'd want me to make a proper PR to mojo repo where we could work on the further improvements.

avitkauskas commented 1 month ago

I think I've put enough now into the code, docstrings and tests to make it worth a few minutes of time of @JoeLoser. I would be glad to hear any comments, and if there is any interest of me making a PR to include this Deque implementation into the Mojo standard library. I know it's big. But I hope we'd be able to move piece by piece with comments and improvements. I'd be glad to hear anything back on this. Thanks! Repo: https://github.com/avitkauskas/mojo-datastructs/tree/main Code: https://github.com/avitkauskas/mojo-datastructs/blob/main/datastructs/deque.mojo

JoeLoser commented 1 month ago

I think I've put enough now into the code, docstrings and tests to make it worth a few minutes of time of @JoeLoser. I would be glad to hear any comments, and if there is any interest of me making a PR to include this Deque implementation into the Mojo standard library. I know it's big. But I hope we'd be able to move piece by piece with comments and improvements. I'd be glad to hear anything back on this. Thanks! Repo: https://github.com/avitkauskas/mojo-datastructs/tree/main Code: https://github.com/avitkauskas/mojo-datastructs/blob/main/datastructs/deque.mojo

Nice! I'll take a look at this in the next few days and provide feedback here and we can talk about the best tactical way to split up things incrementally to adopt it into the stdlib 👍

JoeLoser commented 3 weeks ago

I think I've put enough now into the code, docstrings and tests to make it worth a few minutes of time of @JoeLoser. I would be glad to hear any comments, and if there is any interest of me making a PR to include this Deque implementation into the Mojo standard library. I know it's big. But I hope we'd be able to move piece by piece with comments and improvements. I'd be glad to hear anything back on this. Thanks! Repo: https://github.com/avitkauskas/mojo-datastructs/tree/main Code: https://github.com/avitkauskas/mojo-datastructs/blob/main/datastructs/deque.mojo

This is a great implementation! I'd happily welcome this into the standard library. I'd suggest thinking about how to split up the changes into a few smaller PRs (e.g. constructors + few APIs along with tests, and keep rinse/repeating) to make it a bit easier for review. It's helpful seeing what the big end state would look like roughly without the splitting up of work. To help make it easier for me and the @modularml/mojo-standard-library team to review, smaller PRs would be helpful though.

Some high-level thoughts in no specific order:

Have you considered using UInt instead of Int for some of these data members which are known to be non-negative (e.g. size, capacity, etc.)? UInt is relatively new, so a lot of the existing data structures haven't used it yet internally, but may be nice to explore.
add dunder should specify the capacity of the one deque it's getting its elements from to avoid reallocations and not rely on the default capacity of 64
mul and imul dunder could use UInt
eq could use getitem to simplify some of the code, or did you avoid that to avoid the overhead with index normalization done in the getitem implementation since it's not needed in eq and friends?

avitkauskas commented 3 weeks ago

I'm glad you liked it, Joe! I will clone the nightly mojo later this week, and will make the PR with the first part of the implementation and related tests. I hope chunks of code of about 200-300 LOC + tests would be acceptable for reviewers.

Answering the questions:

Have you considered using UInt instead of Int for some of these data members which are known to be non-negative (e.g. size, capacity, etc.)? UInt is relatively new, so a lot of the existing data structures haven't used it yet internally, but may be nice to explore.

I'm afraid that using UInt for capacity etc. could be asking for trouble. Checking in the mojo REPL, Int is implicitly casted to UInt:

var a: UInt
var b = -1
a = b
print(a)  #  18446744073709551615

Imagine I'd accidentaly do q = Deque[Int](capacity=b)... So, in my opinion, taking Int and explicitly deciding how to handle negative numbers gives less surprises.

mul and imul dunder could use UInt

Same with mul: what would happen when you do q * b when b is negative Int? Pyhon returns an empty deque when multiplied by the negative number. I followed their API.

add dunder should specify the capacity of the one deque it's getting its elements from to avoid reallocations and not rely on the default capacity of 64

The thing is that Deque is a totally different animal compared, say, to List: it can have capacity set to some high number by the user initially and have just a few elements yet, and (in my implementation) also be restricted from shrinking, it could also have maxlen property set. And in python, the result of add inherits the properties of the first addend (that is self). Therefore, it would not be right to reset capacity to some number like len(q1) + len(q2) as sometimes it could be smaller that the initial capacity of q1. And it's not good to set it to q1.capacity + q2.capacity because q1.capacity could be big enough already. Overall, it could be made a bit more efficient with allocations if we precalculate the capacity of the result to be not less than q1.capacity and big enough to accomodate both deques and allocate it before the addition. Could be discussed if it's worth the trouble. And also, now a = a + b results in the deque with the same characteristics as a += b that makes sense to me.

eq could use getitem to simplify some of the code, or did you avoid that to avoid the overhead with index normalization done in the getitem implementation since it's not needed in eq and friends?

Yes, exactly, I do not use getitem in the implementation of other methods for efficiency.

JoeLoser commented 3 weeks ago

@avitkauskas thanks for the discussion - all of your responses make sense and SGTM. Those PR-sizes of a few hundred LoC is totally reasonable 👍

modularml / mojo