rubyworks / pqueue

Priority Queue in pure Ruby
Other
69 stars 7 forks source link

New Maintainer Needed #1

Open trans opened 11 years ago

trans commented 11 years ago

I'm have my doubts anyone is using the gem. So unless someone would like to take it over, I am going to deprecate it at the start of the new year.

mdunsmuir commented 10 years ago

I happened across this gem while messing around with some pathfinding algorithms, and it fit the bill... but then I got curious about the implementation and was able to improve on its speed significantly (specifically, by over an order of magnitude when pushing and then popping 100k random integers) by using a true binary heap rather than the binary search method the gem appears to be based on (which is what I was doing before I found the gem in the first place).

My implementation (still pretty rough, but fully functional) is here: https://gist.github.com/mdunsmuir/10608477

I have no experience maintaining open source software, but I have years of experience writing ruby professionally and I intend to clean up and publish my implementation (despite the apparent lack of interest)... if you're looking for a maintainer, maybe you'd consider handing it off to me? Let me know if you're interested.

trans commented 10 years ago

Would be happy to do so! I assume the new implementation's API will be compatible with the previous?

I'll do some admin clean up of the project, and write a little overview of typical maintenance tasks to help you out. Really isn't much too it.

mdunsmuir commented 10 years ago

My interface isn't fully compatible yet, but there aren't really any big issues to getting there... I just have to add a bit more code and think about some optimizations, and I won't release anything until I get to full (tested) compatibility.

I did do a bit more side-by-side testing last night, and I determined that I'm probably not quite going to be able to match PQueue's speed for adding large numbers of values at once (either at initialization or afterwards) since in those cases it's able to take advantage of a small number of calls to Array#sort, which is fast. However, my implementation is much faster for large volumes of individual mixed enqueue/dequeue operations, which happened to suit my particular application.

I feel like this might be a good trade-off to make, since concatenating and sorting Arrays is easy enough to do without a dedicated priority queue class if your application lets you get away with adding large chunks of data at once, and the functionality provided by a true heap is a little more difficult to come by in ruby. But, I'm interested to know what you think about it... I obviously don't want to replace a perfectly good implementation with my own, relatively untested one unless there is a good justification for doing so.

trans commented 10 years ago

How much slower is it at adding large number of values? If it's a small multiple, I don't think it's a big deal. If it's an order of magnitude, then maybe some consideration needs to be given to this --possibly the ability to choose the internal data structure.

mdunsmuir commented 10 years ago

Here are the times for adding 100k random values at once (at initialization) and then popping them all at once:

PriorityQueue time: 2.79624 seconds, PQueue time: 0.740536 seconds

Here's the same result for 1M values:

PriorityQueue time: 97.728366 seconds, PQueue time: 10.081724 seconds

And here are the times, using 100k values again, for iterating over insert 10, delete (minimum) 5 until no values remain, then popping until the queue is empty:

PriorityQueue time: 3.080428 seconds, PQueue time: 30.878116 seconds

For 1M values, PriorityQueue did the same thing in 74.780751 seconds, but I cut PQueue off after around 15-20 minutes.

I think this is because in practice the heap insert rarely ends up needing to traverse the whole height of the tree, while the binary search insert always does, and additionally it has to insert values into the middle of the Array while the heap implementation is Array-based but only has to either swap indices or push/pop the last value (I have no idea how ruby implements these operations, though). However, the sorted array implementation (PQueue) handily wins for the delete (minimum) operation, since all it has to do is pop off the front of the list, while the heap implementation has to also maintain the heap property.

I'm still personally torn on whether it's worth including both data structures... honestly if my application had called for easily separable, large blocks of inserting and deleting, I would have just used sorted Arrays and not thought to look for a gem in the first place. However, I'm not sure what people have been using this gem for, so I'll abide by whatever you think is best.

trans commented 10 years ago

On Tue, Apr 15, 2014 at 5:06 PM, Michael Dunsmuir notifications@github.comwrote:

Here are the times for adding 100k random values at once (at initialization) and then popping them all at once:

PriorityQueue time: 2.79624 seconds, PQueue time: 0.740536 seconds

Here's the same result for 1M values:

PriorityQueue time: 97.728366 seconds, PQueue time: 10.081724 seconds

And here are the times, using 100k values again, for iterating over insert 10, delete (minimum) 5 until no values remain, then popping until the queue is empty:

PriorityQueue time: 3.080428 seconds, PQueue time: 30.878116 seconds

For 1M values, PriorityQueue did the same thing in 74.780751 seconds, but I cut PQueue off after around 15-20 minutes.

I think that although the (theoretical) worst-case times for insert are the same for both data structures, O(log(n)), in practice the heap insert rarely ends up needing to traverse the whole height of the tree, while the binary search insert always does. However, the sorted array implementation (PQueue) handily wins for the delete (minimum) operation, since all it has to do is pop off the front of the list, while the heap implementation has to also maintain the heap property.

I'm still personally torn on whether it's worth including both data structures... honestly if my application had called for easily separable, large blocks of inserting and deleting, I would have just used sorted Arrays and not thought to look for a gem in the first place. However, I'm not sure what people have been using this gem for, so I'll abide by whatever you think is best.

I am inclined to agree with you. So let's do it your way. If any complaints arise we'll consider what to do about it then.

One thought, when initializing, might there be a way to optimize insertion rather than just looping over push? I'm not sure how, I just have a gut feeling that it should be possible since we have all the entries at once and there is nothing it the heap yet.

mdunsmuir commented 10 years ago

I did mess around with some other options, and eventually realized (duh) that you can just initialize it with a sorted array and that satisfies the heap property. There is a potentially faster heap-based algorithm, but in practice the sort is faster since it is presumably running native code rather than something implemented in ruby.

I think the difference mainly comes down to the fact that popping off the heap is an O(log(n)) operation in the worst case (to down heap the value you replace the root with), while for the sorted array implementation you just have to pop off the front of the array. Although... if it remembered if the array was sorted or not, and just did simple pops until an element was added, at which point it would begin to worry about the heap property... maybe we could have the best of both worlds. I will investigate that possibility.

One other addition I need to make is a concat/merge operation that concats new values and then sorts the heap array. That could reset the "cheat" described above, which would decrease the performance gap even more.

mdunsmuir commented 10 years ago

Of course, at that point, it comes down to the user to understand where it becomes more efficient to concat than push individually... we could also cache pushes until the next pop comes in, and then decide how to proceed (concat vs. individual pushes) based on benchmark data. :)

mdunsmuir commented 10 years ago

Here's some results with that optimization; first, initialized with 100k values then everything popped:

PriorityQueue time: 0.095463 seconds, PQueue time: 0.72315 seconds

And now, with 100k values, the same iteration over pushing 10 values, popping 5, then popping the rest:

PriorityQueue time: 2.8102 seconds, PQueue time: 29.511129 seconds

I'm not sure why PQueue comes up so much slower in the first test...

trans commented 10 years ago

Awesome.

So I created a team for the rubyworks/pqueue project and added you as a maintainer. So you can maintain the project via the rubyworks repository, if you'd like to do it that way --in which case I can still lend a hand if/as needed. But if you you prefer to move the project completely over to a repository under your account/organization, that's fine too. Just let me know.

mdunsmuir commented 10 years ago

Hey, just wanted to let you know that I haven't forgotten about this! Things just got very busy in my life all of a sudden; I should have some bandwidth to think about priority queues soon, maybe even this weekend.

ldonnet commented 10 years ago

Hi,

I use your great gem for shortest_path and I realize some tests with a lot of data. The code is a bit slow with a big graph compared to PriorityQueue for example. I'm very interested to improve performance of this gem and help to test new versions.

Thanks for your work Luc

mdunsmuir commented 10 years ago

I recently happened across this gem much in the same way you seem to have (by working on pathfinding algorithms) and I wrote an implementation of PQueue that is still pure ruby but uses a simple binary heap structure in place of the current binary-search-and-insert technique and in doing so achieves a significant speedup over the current implementation. I've nominally taken over maintaining this gem and it is my intention to release a new version using the improved heap-based implementation, but I haven't had time to get around to it yet.

Any day now, hopefully I'll be able to start putting something together, and I'll absolutely let you know when I do. It's great to hear from someone who is using the gem, and I'd much appreciate any testing you could do on the forthcoming release.

Stephenitis commented 10 years ago

Is this still maintained? I will help since I'm referencing this implementation for students at Devbootcamp.

trans commented 10 years ago

Yes, but passively. Any help you want to give would be great. You can even take the reigns completely if you wish. @mdunsmuir talked about a rewrite to make it much faster, albeit slower for individual inserts, but I guess he has been too busy to do so.

ldonnet commented 10 years ago

@Stephenitis I try different libraries for priority queue and this library is great because written in pure ruby. But slower when you compare to priority_queue_cxx (or before I find this library https://github.com/ldonnet/priority_queue). You can find an implementation here : https://github.com/dryade/shortest_path/blob/master/shortest_path.gemspec