Two Heaps - Githubissues

对于有一些题目，我们会给到一堆可以分成两个parts的元素。去解决这类问题，我们通常对第一个part的最大值和第二个部分的最小值感兴趣。Two Heaps这个pattern很适合解决这类问题。

我们可以建一个max heap去保存第一个part的最大值，一个min heap去保存第二个part的最小值。

这么说有点抽象，我们可以通过下面这道经典例题来理解这个pattern。

模版题 295. Find Median from Data Stream

题目让我们建一个class去有效率的求一个data stream的中位数。该class里需要有两个函数：

void addNum(int num)：把num加到我们的class中

double findMedian()：返回当前的data stream中的中位数
Example:

insertNum(3)

insertNum(1)

findMedian() -> output: 2

insertNum(5)

findMedian() -> output: 3

insertNum(4)

findMedian() -> output: 3.5

这道题要怎么去思考呢？

最开始我们可以考虑用brute force的方法去做。我们可以维护一个sorted array，这样每次我们要找中文数的时候，往array中间的index找就可以了。但是这个方法我们需要O(n)的时间来让一个元素进入array。

我们能做的更好吗？因为我们实际上不需要保证整个array都是fully sorted的，我们只care中间的元素。

假设x是我们array中的中位数，那整个array中有一半的数字是小于（或等于）x的，另外一半的数字是大于（或等于）x的。那我们就可以考虑把array分成两个部分：第一个部分去保存小于x的numbers，另外一部分去保存大于x的numbers。那我们要返回的中位数，要么是第一部分数字中的最大值，或者是第二部分数字中的最小值，或者是两者的平均值。

所以我们很容易联想到用heap这个数据结构来保存我们的数字：

我们把第一部分小于x的数字保存到一个max heap中（方便我们取该部分的最大值）。
我们把第二部分大于x的数字保存到一个min heap中（方便我们取该部分的最小值）。
往heap中加新的元素需要O(lgn)，这个比brute force的方法更快。
而要返回中位数的时候，我们只要把两个heap中的peek取出并计算即可。

当data stream进来的number个数是偶数时，我们保证两个heap的size相同即可。如果是奇数的话，保证其中一个heap的size比另一个heap大1就好。这里选哪个heap都可以，我们这里assume我们的max heap的size可以比min heap的size大1。

我们可以走上面这个例子来看一下：

insertNum(3)：因为我们assume max heap的size可以比min heap的size大1，3我们直接入第一个max heap （如果incoming number小于等于max heap的peek，我们也需要加到我们的heap中）。要注意的是，每次加入一个新的元素，我们都需要把两个heap的size重新balance一下。就像上面说的，如果总number数是偶数，两个heap的size要一样，不然的话，max heap的size允许比min heap的元素大1（这样也方便我们直接取中位数，偶数的时候我们把两个heap的peek取平均就好，奇数的时候直接返回max heap的peek）。

insertNum(1): 因为1比3小，我们让1进max heap。

但是这时候两个heap的size不平衡了，我们要把max heap的最大值放到min heap中去。

findMedian(): 因为现在总numbers的数量是偶数，我们把max heap和min heap的peek值求个平均就好: (1+3)/2=2
insertNum(5): 因为5比3大，我们把5加到min heap中，这时候min heap的size是2，max heap的size是1，不平衡了，我们需要重新balance一下，即把min heap的peek值放到max heap中。

findMedian(): 这个时候总的numbers的数量是奇数，我们直接返回max heap的peek值，即返回3.
insertNum(4)：4比max heap的最大值要大，我们放入min heap中。此时两个heap的size是balanced，无需移动。

findMedian(): 这个时候总的numbers的数量是偶数，我们把max heap和min heap的peek值求个平均就好: (3+4)/2=3.5

代码如下：

class MedianFinder {

    PriorityQueue<Integer> maxHeap; //containing first half of numbers
    PriorityQueue<Integer> minHeap; //containing second half of numbers

    public MedianFinder() {
        // maxHeap = new PriorityQueue<>((a,b)->(b-a));
        maxHeap = new PriorityQueue<>(Collections.reverseOrder());
        minHeap = new PriorityQueue<>();
    }

    public void addNum(int num) {
        if(maxHeap.isEmpty() || num <= maxHeap.peek()) {
            maxHeap.offer(num);
        } else {
            minHeap.offer(num);
        }

        // either both the heaps will have equal number of elements or max-heap will have one 
        // more element than the min-heap
        if(maxHeap.size() > minHeap.size()+1) {
            minHeap.offer(maxHeap.poll());
        } else if(maxHeap.size() < minHeap.size()) {
            maxHeap.offer(minHeap.poll());
        }
    }

    public double findMedian() {
        if(maxHeap.size() == minHeap.size()) {
            return (maxHeap.peek() + minHeap.peek())/2.0;
        } else {
            return maxHeap.peek();
        }
    }
}

480. Sliding Window Median

You are given an integer array nums and an integer k. There is a sliding window of size k which is moving from the very left of the array to the very right. You can only see the k numbers in the window. Each time the sliding window moves right by one position.
Return the median array for each window in the original array. Answers within 10^-5 of the actual value will be accepted.

这题虽然在leetcode上的难度是hard，但是如果我们掌握了two heaps这个pattern，就不难了。

总体的思路和上题一样，但是这题我们需要对每个大小为k的window来提取median。在每次iteration时，我们需要把一个数字给remove出heap。Remove完了之后，还要rebalance一下，就像我们在insert完了之后的rebalance一样。

这道题还有个大坑，就是定义最大堆的时候，我之前通常的定义方法是：

PriorityQueue<Integer> maxHeap = new PriorityQueue<>((a,b)->(b-a));

但是这题中使用这个方法会有case死活过不去，因为造成了overflow：

Input:[-2147483648,-2147483648,2147483647,-2147483648,-2147483648,-2147483648,2147483647,2147483647,2147483647,2147483647,-2147483648,2147483647,-2147483648],3

这种方法以后不能再用来定义最大堆了，除非题目有对integer有严格的范围要求，保证不overflow。我们来想一个例子：

如果我们把-2147483648和1加到我们的max heap中，理论上1应该是我们max heap的root，但是因为overflow，导致了-2147483648成为了我们的root。

假设a=1, b = -2147483648, b-a = -2147483648-1 = -2147483649 (OVERFLOW) = 2147483647 -> 优先级最低，b到a的前面去了

所以我们在定义最大堆的时候，应该用如下的方法来定义：

PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Collections.reverseOrder());
PriorityQueue<Integer> maxHeap = new PriorityQueue<>(b.compareTo(a));

https://stackoverflow.com/questions/11003155/change-priorityqueue-to-max-priorityqueue

时间复杂度：O(n * k), 原因是我们要对所有n个元素进行遍历，遍历的过程中，我们需要：

往size为k的heap中insert元素: O(lgk)
往size为k的heap中remove元素: O(k)

空间复杂度：O(k)

class Solution {
    // PriorityQueue<Integer> maxHeap = new PriorityQueue<>((a,b)->(b-a));
    PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Collections.reverseOrder());
    PriorityQueue<Integer> minHeap = new PriorityQueue<>();
    public double[] medianSlidingWindow(int[] nums, int k) {
        double[] res = new double[nums.length-k+1];

        for(int i = 0; i < nums.length; i++) {
            if(maxHeap.isEmpty() || nums[i] <= maxHeap.peek()) {
                maxHeap.offer(nums[i]);
            } else {
                minHeap.offer(nums[i]);
            }

            rebalanceHeaps();

            if(i-k+1 >= 0) {
                if(maxHeap.size() == minHeap.size()) {
                    res[i-k+1] = maxHeap.peek()/2.0 + minHeap.peek()/2.0;
                } else {
                    res[i-k+1] = maxHeap.peek();
                }

                int elementToBeRemoved = nums[i-k+1];
                if(elementToBeRemoved <= maxHeap.peek()) {
                    maxHeap.remove(elementToBeRemoved);
                } else {
                    minHeap.remove(elementToBeRemoved);
                }

                rebalanceHeaps();
            }
        }

        return res;
    }

    public void rebalanceHeaps() {
        if(maxHeap.size() > minHeap.size()+1) {
            minHeap.offer(maxHeap.poll());
        } else if(maxHeap.size() < minHeap.size()) {
            maxHeap.offer(minHeap.poll());
        }
    }
}

502. IPO

假设力扣（LeetCode）即将开始其 IPO。为了以更高的价格将股票卖给风险投资公司，力扣希望在 IPO 之前开展一些项目以增加其资本。由于资源有限，它只能在 IPO 之前完成最多 k 个不同的项目。帮助力扣设计完成最多 k 个不同项目后得到最大总资本的方式。
给定若干个项目。对于每个项目 i，它都有一个纯利润 Pi，并且需要最小的资本 Ci 来启动相应的项目。最初，你有 W 资本。当你完成一个项目时，你将获得纯利润，且利润将被添加到你的总资本中。
总而言之，从给定项目中选择最多 k 个不同项目的列表，以最大化最终资本，并输出最终可获得的最多资本。

从题意我们可以知道，每当我们在选择一个要投资的项目的时候，我们有2个constrains:

我们只能选我们有足够capital的项目
我们总共能投资的项目有限

所以这道题，我们可以采用一个贪心的思想，每当我们在挑项目的时候，我们可以做两件事情：

把所有我们有足够capital去投资的项目选出来。
我们把当前选出来的项目中，选出能给我们带来最大利润的项目。

所以这题我们同样可以用two heaps的思想去做：

我们把所有project要用到的capital加到min heap里，这样我们可以很方便的找到当前我们能投资的项目。
把我们能投资的项目全部选出来，并且把这些项目加到max heap中，我们就很方便的知道我们投资的项目可以带来的最大利润。
最后，我们把能获得最大利润的项目从max heap中取出来。
重复2-3直到我们投资完所有能投资的项目。

时间复杂度：O(nlgn + klgn)，因为我们会把所有的project都push到heap中，并且我们会选择其中的k个项目。空间复杂度: O(n()

class Solution {
    public int findMaximizedCapital(int k, int w, int[] profits, int[] capital) {
        int n = profits.length;
        PriorityQueue<Integer> minHeap = new PriorityQueue<>((i1, i2) -> (capital[i1]-capital[i2]));
        PriorityQueue<Integer> maxHeap = new PriorityQueue<>((i1, i2) -> (profits[i2]-profits[i1]));

        // insert all project capitals to a min-heap
        for(int i = 0; i < n; i++) {
            minHeap.offer(i);
        }

        // let's try to find a total of 'numberOfProjects' best projects
        int availableCapital = w;
        for(int i = 0; i < k; i++) {
            // find all projects that can be selected within the available capital and insert them in a max-heap
            while(!minHeap.isEmpty() && capital[minHeap.peek()] <= availableCapital) {
                maxHeap.offer(minHeap.poll());
            }

            // terminate if we are not able to find any project that can be completed within the available capital
            if(maxHeap.isEmpty()) {
                break;
            }

            // select the project with the maximum profit
            availableCapital += profits[maxHeap.poll()];
        }

        return availableCapital;
    }
}

436. Find Right Interval

给你一个区间数组 intervals ，其中 intervals[i] = [starti, endi] ，且每个 starti 都不同。
区间 i 的右侧区间可以记作区间 j ，并满足 startj >= endi ，且 startj 最小化。
返回一个由每个区间 i 的右侧区间的最小起始位置组成的数组。如果某个区间 i 不存在对应的右侧区间，则下标 i 处的值设为 -1 。

有些时候two heaps不一定就是一个max heap和一个min heap，也有可能两个都是同一种类型的heap。

时间复杂度：O(nlgn) 空间复杂度：O(n)

class Solution {
    public int[] findRightInterval(int[][] intervals) {
        // heap for finding the maximum start
        PriorityQueue<Integer> maxStartHeap = new PriorityQueue<>((a, b) -> (intervals[b][0]-intervals[a][0]));
        // heap for finding the minimum end
        PriorityQueue<Integer> maxEndHeap = new PriorityQueue<>((a, b) -> (intervals[b][1]-intervals[a][1]));

        int[] res = new int[intervals.length];

        for(int i = 0; i < intervals.length; i++) {
            maxEndHeap.offer(i);
            maxStartHeap.offer(i);
        }

        // go through all the intervals to find each interval's next interval
        for(int[] interval : intervals) {
            // let's find the next interval of the interval which has the highest 'end'
            int topEnd = maxEndHeap.poll();
            // defaults to -1
            res[topEnd] = -1;

            if(intervals[maxStartHeap.peek()][0] >= intervals[topEnd][1]) {
                int topStart = maxStartHeap.poll();
                // find the the interval that has the closest 'start'
                while(!maxStartHeap.isEmpty() && intervals[maxStartHeap.peek()][0] >= intervals[topEnd][1]) {
                    topStart = maxStartHeap.poll();
                }
                res[topEnd] = topStart;
                // put the interval back as it could be the next interval of other intervals
                maxStartHeap.offer(topStart);
            }
        }

        return res;
    }
}

xiedaxia1hao / leetcode_pattern

Two Heaps #5

模版题 295. Find Median from Data Stream

480. Sliding Window Median

502. IPO

436. Find Right Interval