teabuff / teabuff.github.io

Explain algorithms simple and clear
0 stars 0 forks source link

Median of Two Sorted Array in Same Size #1

Closed teabuff closed 3 years ago

teabuff commented 8 years ago

Median of Two Sorted Array in Same Size

Suppose you have two sorted array A and B in same size n, implement an algorithm to find the median of two sorted array.

For example, A = {10, 20, 30, 40} and B = {25, 35, 45, 55}.

Naive method

If we merge the two array, we will get a new Array C = {10, 20, 25, 30, 35, 40, 45, 55}, we quickly know that the median is (30 + 35) / 2 = 32.5; This algorithm require O(n) space complexity and O(n) time complexity.

Can we do better? One way to improve is to use two pointers to traverse both arrays without actually merging them. Let two pointers be the head of A and B respectively, and the pointer has smaller element increment one step each time. Since the median is the nth smallest/largest element in a 2n-size array, we can get the median by incrementing one step n times.

public double getMedian(int[] nums1, int[] nums2) {
    int n = nums1.length;
    if(n == 0) return 0;

    int i = 0, j = 0;
    int ml = -1, mr = -1;
    for(int k = 0; k <= n; k++) {
        int tmp1 = i >= n ? Integer.MAX_VALUE : nums1[i];
        int tmp2 = j >= n ? Integer.MAX_VALUE : nums2[j];

        ml = mr;
        if(tmp1 < tmp2) {
            mr = tmp1;
            i++;
        } else {
            mr = tmp2;
            j++;
        }
    }

    return (ml + mr) / 2.0;
}

Binary Search method

O(n) solution seems quite good. However, I learned a O(logn) method from the internet. let x and y be the mid of Array A and B. By comparing the x and y, we can conclude:

  1. If x = y = val, it means that for the val, there are (n / 2 + n / 2) elements smaller than val and (n / 2 + n / 2) elements larger than val. Therefore, val is the median we find.
  2. if x < y, for any element e in the left half of A, there are already n + 1 elements larger than e, median cannot be that part. So for the right half of B. We can exclude the two part.
  3. if x > y, for any element e in the right half of A, there are already n + 1 elements smaller than e, median cannot be that part. So for the left half of B. We can exclude the two part.
  4. Treat the rest part as two array and repeat the previous step until we have only two elements in each array
  5. directly find the median of the two 2-elements arrays and that median is the result.

[Algorithm comes from here]

private double helper(int[] nums1, int s1, int e1, int[] nums2, int s2, int e2) {
    int len = e1 - s1 + 1;
    if(len == 1) return (nums1[s1] + nums2[s2]) / 2.0;
    if(len == 2) return (Math.max(nums1[s1], nums2[s2]) + Math.min(nums1[e1], nums2[e2])) / 2.0;

    int m1 = s1 + (e1 - s1) / 2;
    int m2 = s2 + (e2 - s2) / 2;
    if(len % 2 == 0) { // even number
        if(nums1[m1] <= nums2[m2]) {
            return helper(nums1, m1, e1, nums2, s2, e2 - (m1 - s1));
        } else {
            return helper(nums1, s1, e1 - (m2 - s2), nums2, m2, e2);
        }
    } else { // odd number
        if (nums1[m1] == nums2[m2]) return nums1[m1];
        else if (nums1[m1] < nums2[m2])
            return helper(nums1, m1, e1, nums2, s2, m2);
        else
            return helper(nums1, s1, m1, nums2, m2, e2);
    }
}

public double findMedianSortedArray(int[] nums1, int[] nums2) {
    if(nums1.length == 0) return 0;
    return helper(nums1, 0, nums1.length - 1, nums2, 0, nums2.length - 1);
}

Notice: In above algorithm, when size of array is even, we treat it different than the algorithm description. DO THINK the reason and you will get better understanding. If you cannot get a reason, read the below sections. Or leave a comment if you really cannot get it.


However, the solution won't work for two sorted array with different size. Why?

Why it works for same size?

Let's first see why it work for both array at same size n. Light grey boxes means the elements excluded.

1 median-of-two-sorted-array-in-same-size

By applying step 1~3 of above algorithm, we can exclude smallest n/2 from A and largest n/2 from B. The left side of A must be less than median and the right side of B must larger than the median. Therefore, from the merged view of A and B, it both excluded n/2 elements from the left part of the median and the right part of median as the below figure:

1 median-of-two-sorted-array-in-same-size-2

After that, we found the target median is still the median of the newly generated array. Therefore, the problem turn from

find median of two sorted array at size n

into

find median of two sorted array at size n-n/2

Repeat the previous step as step 4 will definitely work.

Why it does'n work for different size?

Suppose Array A has length m, B has length n, A_mid < B_mid. In the fist iteration, we eliminate m/2 smallest elements from A and n/2 largest elements from B.

1 median-of-two-sorted-array-in-same-size-3

from the merged view of A and B, m/2 elements removed from the left part before median, n/2 elements removed from the right part after median.

1 median-of-two-sorted-array-in-same-size-4

Clearly, the target median is no longer the median of newly generated array. The left part before median - n / 2 differ from the right part after median - m / 2. The new problem is not the previous problem, thus repeat the algorithm will not working.

Conclusion

To dealing with two different size sorted array, since the median, which represents the (m+n)/2th smallest elements in merged array, is no longer median after one iteration. We can turn the problem into find kth smallest elements in two sorted array. And we adjust the value of k after each iteration. We will discuss it in the [NEXT POST].