works correctly by replacing priority queue to vector. use std::make_heap and std::push_heap to update the vector heap.
much faster than waterz. the computation time reduced from about 2000 seconds to about 200 seconds. The memory consumption reduced from about 40 GB to 600MB!