std.PriorityQueue.update assumes T supports the equality operator

paoda commented 3 years ago

Zig Version

0.9.0-dev.1275+ac52e0056

Steps to Reproduce

Given the main.zig described below:

const std = @import("std");

const Event = struct {
    kind: EventKind,
    timestamp: u64,
};

const EventKind = enum {
    u64Overflow,
};

fn lessThan(a: Event, b: Event) std.math.Order {
    return std.math.order(a.timestamp, b.timestamp);
}

pub fn main() !void {
    const alloc = std.heap.page_allocator;

    var pq = std.PriorityQueue(Event).init(alloc, lessThan);
    defer pq.deinit();

    // some contrived reason to use PriorityQueue.update
    const wrong = Event{
        .kind = EventKind.u64Overflow,
        .timestamp = std.math.maxInt(u32),
    };

    const right = Event{
        .kind = EventKind.u64Overflow,
        .timestamp = std.math.maxInt(u64),
    };

    try pq.add(wrong);
    try pq.update(wrong, right);
}

run zig build in order to reproduce the bug.

Expected Behavior

I expected the zig program to compile.

Actual Behavior

I received the following from stderr:

/piston/packages/zig/0.8.0/bin/lib/std/mem.zig:1009:22: error: operator not allowed for type 'Event'
        if (slice[i] == value) return i;
                     ^
/piston/packages/zig/0.8.0/bin/lib/std/mem.zig:993:28: note: called from here
    return indexOfScalarPos(T, slice, 0, value);
                           ^
/piston/packages/zig/0.8.0/bin/lib/std/priority_queue.zig:209:60: note: called from here
            var update_index: usize = std.mem.indexOfScalar(T, self.items[0..self.len], elem) orelse return error.ElementNotFound;
                                                           ^
/piston/packages/zig/0.8.0/bin/lib/std/priority_queue.zig:208:64: note: called from here
        pub fn update(self: *Self, elem: T, new_elem: T) !void {
                                                               ^
/piston/packages/zig/0.8.0/run: line 4: ./out: No such file or directory

Which suggests to me that either this is an oversight or it should be made clear that there is an equality operator constraint on std.PriorityQueue.update or std.PriorityQueue as a whole.

paoda commented 3 years ago

Changing the implementation of indexOfScalarPos from:

pub fn indexOfScalarPos(comptime T: type, slice: []const T, start_index: usize, value: T) ?usize {
    var i: usize = start_index;
    while (i < slice.len) : (i += 1) {
        if (slice[i] == value) return i;
    }
    return null;
}

to:

pub fn indexOfScalarPos(comptime T: type, slice: []const T, start_index: usize, value: T) ?usize {
    var i: usize = start_index;
    while (i < slice.len) : (i += 1) {
        if (meta.eql(slice[i], value)) return i;
    }
    return null;
}

resolves this issue and the program in my original comment will then proceed to compile. However, I can't comment on the performance implications of such a change, or whether this would even be the right way of solving this.

Chris3606 commented 2 years ago

I can't comment on the performance of that fix either; but whether the solution is that, or passing in a custom equality function (perhaps via a context like HashMap uses), I believe there are clear use cases for which == is insufficient.

Even basic algorithms like Dijkstra will require the association of a value to its priority; and as far as I'm aware, this requires the use of a struct as the priority queue type, something to the effect of:

pub const DijkstraNode {
    node_val: T,
    priority: u32
};

Namely because, there isn't a good way (that I'm aware of) to implement the required comparison function for a priority queue unless the value and the priority are contained within the same struct. There are obviously some current limitations to closures/lambda capturing in Zig, as discussed particularly near the bottom of #1048, so if you had a priority queue of T values (even assuming T supports == ), I don't see how you could implement that comparison function without a struct like this, even if you maintained the mapping of values to priority in a separate structure.

Chris3606 commented 2 years ago

In Dijkstra's specifically, it is of note that it wouldn't require this association; you could just insert duplicate elements; but I would think that there are still cases where this behavior could be problematic.

Deecellar commented 2 years ago

I think is a missuse of std.mem.indexOfScalar the name suggest it shoud be used only on scalar values that are numerical (or so I understand), so really priorityQueue should use std.mem.indexOfPos since that one uses internally std.mem.eql and would also fix it, think is, that std.mem.indexOfPos need more than one value, so maybe a better proposal would be adding a function that takes 1 element and makes it an slice and pass it to indexOfPos?

thomastay commented 2 years ago

Ran into this today while doing Djikstra too. I propose that the update function shouldn't do any searching in the queue for you. Instead, it should take in an index of the item to update, and then just update it with the new value. Users can write their own version of std::find customized to whatever they want.

Here's a code sample of what I'm thinking of. This is code from AoC day 15 I was working on, when I wanted to implement addOrUpdate to the priority queue. Bear in mind that this is fantasy code so it might not actually be correct 😅

fn addOrUpdate(queue: *DjikstraPQ, item: QueueDatum) !void {
    var foundIdx: usize = 0;
    // std::find
    var it = queue.iterator();
    while (it.next()) |elt|: (foundIdx += 1) {
        // Users should change this to implement their own find function
        if (elt.pos == item.pos) break;
    }
    if (found != queue.count()) {
        queue.update(foundIdx, item) catch unreachable; // in this case, update takes an index instead
    } else {
        try queue.add(item);
    }
}

As @Chris3606 mentions, adding duplicate elements into the queue is definitely a workaround, but is not a good long term solution.

I took a look at what a Rust crate does, looks like they take the element in but they also have a requirement that the elements inserted must be hashable, and there is an internal hash table that keeps track of which element is in which index. I don't think that is a good solution in general, but is interesting to note.

schmee commented 2 years ago

Closed in https://github.com/ziglang/zig/pull/10342?

Chris3606 commented 2 years ago

Closed in #10342?

I don't believe so. The update function in that implementation is still using indexOfScalar: see here, and that function uses == which is what was causing the original issue.

Chris3606 commented 2 years ago

Additionally, I wanted to add a bit to the discussion on the API from above: I agree with @thomastay insofar as an update function that does a linear search doesn't necessarily support all use cases very well. In fact, in an optimized Dijkstra or particularly an A* implementation, for example, although I haven't formally tested it, I believe that even if the update function did work when T is a struct, I'd still prefer creating duplicate elements over the current implementation of update for the sake of performance.

Particularly since in theory (if the API allowed a user to cache nodes themselves and the data types were cached in a way accessible in constant time) it should be possible to have a constant-time lookup of the node then simply the O(log(n)) update operation, I think there would be significant performance benefits in some cases to being able to use an O(log(n)) update function over adding duplicate nodes, since it would be the same operation but without the additional memory usage and items to process in the queue.

voroskoi commented 2 years ago

I have tried my Aoc 2021 solutions with #12154: day15a: https://nest.pijul.com/voroskoi/aoc2021-zig:main/37QWQTZRMBJPE.EMAAA day15b: https://nest.pijul.com/voroskoi/aoc2021-zig:main/QBPUYYX77BA2W.VIAAA

In ReleaseFast first part 1100 us -> 2200 us, the second part 33.000us -> 105.000 us.

PriorityQueue use a binary heap under the hood, so we can not use std.sort.binarySearch() for finding elements. I have no better idea, than the rust solution mentioned before: using a HashMap for storing the array positions to make lookup faster. On the other hand the HashMap just creates overhead when You do not use update()...

ziglang / zig