Trying to fix some problems in Binary Search

sashaafm commented 8 years ago

Me and @NobbZ talked and he pointed some problems with the BS. The spec used Enum.t, but this is not true since the pattern matching only takes lists, so that was changed to a list. Also I used a var called array, but it must be list, since array has different properties from a list.

Another problem lies with getting the mid item's value. I used Enum.fetch, but this takes linear time to fetch the value and it is called in every recursive call of the function. I looked around for some Elixir implementations of the BS but everyone of them used Enum.fetch or Enum.at (I looked at the source and it calls Enum.fetch, so it's the same). I then tried using Erlang's array, but it seems to be failing in three tests.

This implementation is also not good, even if it passed all tests, because it must use :array.from_list to convert the list to an Erlang array in every recursive call.

I think the probable solution will be to implement the whole algorithm in Erlang (or mostly). I'll give a shot at this if you agree.

(I also had two files for the BinarySearchTree? Don't know what that was about, but I deleted one of them)

NobbZ commented 8 years ago

I think I have an idea how to improve the private helper a lot. It does involve chopping down the list gradually and keeping track of the current heads value. Since we had an opulent meal right now and my son really belongs to bed, I can't really concretise my thoughts. Give me 2 or 3 hours, perhaps I might come up with a draft until then.

sashaafm commented 8 years ago

Sure, take care of the kid first :) I'll also have some errands to run now, but maybe I'll still have some free time later to write a draft of BS using Erlang's array.

NobbZ commented 8 years ago

OK, I had really a hard time thinking about this, and I came to the following:

Doing it as drafted above, will result in a similar behaviour as it is now, since we might chop behind the searched element if it is in the first half.
Extension: Return on instant when it is in the first half and we find it on the way to the middle. If not continue with the second half recursively. This would result in a very fancy and complicated linear search
Converting to an erlang-array first would mean to process and copy every single element in the list while transfering it over to the erlang array, which again would need many copies of itself during it grows in size. Even if we consider the arrays growth as having no impact on the runtime, we would iterate the complete list at least once before we could do the binary search on the array.
Converting the list into a BST would have very similar downsides as the array-approch

So I came to the following conclusion: The most efficient way to search a linked list is a linear search, which of course can return false as soon as we find the first element that is greater than the searched element.

sashaafm commented 8 years ago

I think I understand what you mean, but why would the list need to be converted many times? I just did a quick draft:

  def binary_search(list, key) do 
    bs :array.from_list(list), key, 0, length(list)
  end

  defp bs(_, _, min, max) when max < min, do: nil
  defp bs(array, key, min, max) do
    if :array.size(array) == 0 do 
      nil
    end

    if :array.size(array) == 1 do 
      0
    end

    mid = (min + max) >>> 1
    val = :array.get(mid, array)

    cond do
      val > key -> bs array, key, min, mid - 1
      val < key -> bs array, key, min + 1, max
      true      -> mid
    end
  end

it passes all tests, but I'm confused. Erlang arrays are nested tuples and per these posts:

https://stackoverflow.com/questions/28676383/erlang-array-vs-list https://stackoverflow.com/questions/16447921/arrays-implementation-in-erlang

they aren't O(1) when using :array.get/2. So we end up with the same problem, I think.

NobbZ commented 8 years ago

While constructing the erlang array from a list, it involves reading and copying every single element of the list into the array. Also since we cant build the array at once but have to append every element from the list, this means at least copying the subtuple that is updated and the enclosing one (the untouched tuples might be reused depending on the underlying ERTS-Implementation, but also they might get copied again and again!). So for converting from list to array has a runtime of O(n) or worse.

After that we have to search the erlang array for the element, and we assume O(1) for indexed access in the array, we have O(log n) here during the search. We can safely assume O(1) for accessing the elements, since we don't change shape of the array during the search, so the access will be the same during its lifecycle.

But since we already have iterated over every element of the list to create the array, we could have done a linear search right from the first time.

Trying to implement a binary search in a linked list doesn't make sense. All scenarios that I can think of will result in a runtime that is worse than a linear search.

BUT: I think it might make sense to reimplement erlang arrays directly in Elixir and then implementing the binary search for them.

edit: http://stackoverflow.com/questions/5281053/how-to-apply-binary-search-olog-n-on-a-sorted-linked-list <- This SO thread does mention multiple times, that a binary search in O(log n) is impossible.

sashaafm commented 8 years ago

Do you mean like this?

https://github.com/takscape/elixir-array

NobbZ commented 8 years ago

The library you linked is a wrapper, I thought of a complete reimplementation.

Reimplementing would remove one layer of abstraction (vs. that package), also we would have a better control over it.

sashaafm / exads

Trying to fix some problems in Binary Search #14