Chapter 6 Notes, Errata, and Discussion

lihaoyi commented 4 years ago

Thread to discuss anything and everything about Chapter 6. Important topics will be folded into the issue description.

gfn9cho commented 4 years ago

The merge sort implementation is not tail recursive, correct? if yes, Is it okay to implement this way?

lihaoyi commented 4 years ago

@gfn9cho it's not, since it has to recurse twice and perform some computations after the recursion, while tail-recursion requires recursing only once and returning the result of recursion immediately with no further computation.

In general, most recursive things are not tail recursive, and that's OK. If you start hitting stack size limits due to deep recursion, you can pass in -Xss to the JVM to give it a bigger stack, or refactor a recursive algorithm to an imperative algorithm with a loop and ArrayDeque.

serpiente commented 4 years ago

@lihaoyi I'm using the code provided in solution 6.8 - DepthSearchPaths. Using the same directed graph, just different representations (changed order in Seq), I'm getting two different results for the shortest path from node a to node e.

Given these two representations of the same graph, I have different results.

start = "a"
dest = "e"
graph = Map(
        "a" -> Seq("d", "c"),
        "b" -> Seq("d"),
        "d" -> Seq("f"),
        "f" -> Seq("e"),
        "c"->Seq("e"), "e"->Seq())
    )

start = "a"
dest = "e"
graph = Map(
        "a" -> Seq("c", "d"), #changed order
        "b" -> Seq("d"),
        "d" -> Seq("f"),
        "f" -> Seq("e"),
        "c"->Seq("e"), "e"->Seq())
    )

implisci commented 4 years ago

Convert to no vars, recursion only. Good for learning to use options. Can be used to make the immutable Trie solution completely immutable (no vars anywhere), even outside the Immutable Trie Node constructor.

def contains(s: String): Boolean = {
    def loop(xc: List[Char], optnd: Option[Node]): Boolean =
      if (optnd.nonEmpty) {
        xc match {
          case Nil          => if (optnd.exists(_.hasValue)) true else false
          case head :: tail => loop(tail, optnd.get.children.get(head))
        }
      } else false
    loop(s.toList, Option(root))
  }

def prefixesMatchingString(s: String): Set[String] = {
    val prefixSet = Set.newBuilder[String]
    def loop(xs: List[Char], r: String, optnd: Option[Node]): Unit =
      if (optnd.nonEmpty) {
        if (optnd.get.hasValue) prefixSet += r
        if (xs.nonEmpty) loop(xs.tail, r + xs.head, optnd.get.children.get(xs.head))
      }
    loop(s.toList, "", Option(root))
    prefixSet.result
  }

  def stringsMatchingPrefix(s: String): Set[String] = {
    val stringSet = Set.newBuilder[String]
    def loop1(xs: List[Char], optnd: Option[Node]): Option[Node] = optnd match {
      case None     => None
      case Some(nd) => if (xs.isEmpty) Some(nd) else loop1(xs.tail, nd.children.get(xs.head))
    }
    def loop2(nd: Node, r: String): Unit = {
      if (nd.hasValue) stringSet += r
      for ((c, n) <- nd.children) loop2(n, r + c)
    }
    loop1(s.toList, Option(root)) match {
      case None => Set()
      case Some(nd) =>
        loop2(nd, s)
        stringSet.result
    }
  }

implisci commented 4 years ago

Note: The BinarySearch tests involving Vector("i", "am", "cow", "hear", "me", "moo") are invalid as the Seq is not sorted Below is a minor variation that returns the index of the first matching element

// Implement a simple binary search algorithm in Scala
// To find an element within a sorted Array in O(log n) time
// Assumption: IndexedSeq[T] is sorted in Ascending order
// $ amm BindexOf.sc

def bindexOf[T: Ordering](sorted: IndexedSeq[T], target: T): Int = {
  def loop(left: Int, center: Int, right: Int): Int = {
    if (left > right) -1 // target not found or IndexedSeq not sorted
    else
      Ordering[T].compare(target, sorted(center)) match {
        case 0          => center
        case r if r > 0 => loop(center + 1, (center + 1 + right) / 2, right)
        case _          => loop(left, (left + center - 1) / 2, center - 1)
      }
  }
  loop(0, (sorted.length - 1) / 2, sorted.length - 1)
}

assert(pprint.log(bindexOf(Array[Int](), 3)) == -1)
assert(pprint.log(bindexOf(Array(1, 3, 7, 9, 13), 1)) == 0)
assert(pprint.log(bindexOf(Array(1, 3, 7, 9, 13), 9)) == 3)
assert(pprint.log(bindexOf(Array(1, 3, 7, 9, 13), 7)) == 2)
assert(pprint.log(bindexOf(Array(1, 3, 7, 9, 13), 13)) == 4)
assert(pprint.log(bindexOf(Array(1, 3, 7, 9, 13), 8)) == -1)
assert(pprint.log(bindexOf(Array(1, 3, 7, 9, 13), 2)) == -1)
assert(pprint.log(bindexOf(Array(1, 3, 7, 9, 13), 100)) == -1)
assert(pprint.log(bindexOf(Vector("am", "cow", "hear", "me", "moo"), "am")) == 0)
assert(pprint.log(bindexOf(Vector("am", "cow", "hear", "me", "moo"), "cow")) == 1)
assert(pprint.log(bindexOf(Vector("am", "cow", "hear", "me", "moo"), "hear")) == 2)
assert(pprint.log(bindexOf(Vector("am", "cow", "hear", "me", "moo"), "moo")) == 4)
assert(pprint.log(bindexOf(Vector("am", "cow", "hear", "me", "moo"), "horse")) == -1)

implisci commented 4 years ago

// Is an IndexedSeq really Sorted or not?
// This is O(n) but in some cases a binary approach may exit quicker
// The tryBreakable control structure is used to quit computations once a comparison is out of order
// Question: How can the 2 loops in the last case be parallelized safely?
// See Stackoverflow link below for imperative, non-generic approaches
// https://stackoverflow.com/questions/11989071/fastest-way-to-check-if-an-array-is-sorted

def isSeqSorted[T: Ordering](xs: IndexedSeq[T]): Boolean = {
  import scala.util.control.Breaks._
  tryBreakable {
    def loop(a: Int, b: Int): Boolean = {
      (Ordering[T].compare(xs(a), xs(b)) <= 0) match {
        case true if (b - a) <= 1 => true
        case false                => break
        case true                 => loop(a, (a + b) / 2) && loop((a + b) / 2, b)
      }
    }
    if (xs.length <= 1) true else loop(0, xs.length - 1)
  } catchBreak false
}

assert(pprint.log(isSeqSorted(Vector[Int]())) == true)
assert(pprint.log(isSeqSorted(Vector(1))) == true)
assert(pprint.log(isSeqSorted(Vector(1, 2))) == true)
assert(pprint.log(isSeqSorted(Vector(1, 3, 2))) == false)
assert(pprint.log(isSeqSorted(Vector(2, 1, 2, 3, 4))) == false)

val testDVec = {
  (1 to 50).map(_.toDouble).toVector ++
    Vector(49.9999) ++ // out of order element in the middle
  (51 to 100).map(_.toDouble).toVector
}

assert(pprint.log(isSeqSorted(testDVec)) == false)
assert(pprint.log(isSeqSorted(testDVec.filter(_ != 49.9999))) == true)

ilitosh commented 4 years ago

In 6.12.scala, for (c <- s if current.nonEmpty) current = current.get.children.get(c) Does this iterate through the whole string s, or does the compiler optimize the "extra" cycles away by seeing that current cannot change once it becomes empty? I guess my question is if this is what we are relying on by implementing it this way?

SalviCF commented 4 years ago

In stringsMatchingPrefix it wouldn't be better in terms of performance to use collection.mutable.ListBuffer[Char] as the type for the parameter path in the recurse function?

In that way, we can append an element in constant time avoiding reverse path before adding it to output (reverse complexity for lists is O(n)).

It is implemented in that way for avoiding the use of mutable collections?

lihaoyi commented 4 years ago

@ilitosh it always iterates through the whole string, though iteration through the subsequent portion would be faster since it doesn't need to do anything each iteration. This could be made more efficient by using while-loop or tail-recursive function with an early exit

@SalviCF I believe ListBuffer cannot be effectively shared between multiple lists, as we are doing here. We could also use a mutable.ArrayDeque and make sure we append the character before each recursive step and remove the last character after, which would avoid reversing the list but still necessitate copying. In terms of performance, I haven't measured it but I'm guessing all three solutions (current, listbuffer, arraydeque) wouldn't be too far from each other.

milenkara commented 4 years ago

I am looking at the Merge sort example in the chapter 6, specifically generic version of it (6.9.scala). Since here we are using context bound, how can we use Ordering[T] directly without using implicitly keyword?

yuchaoran2011 commented 4 years ago

@milenkara Interestingly, both implicitly(Ordering[T]).lt(...) and Ordering[T].lt(...) work.

SalviCF commented 4 years ago

@lihaoyi in exercise 6.8 - DepthSearchPaths, the shortestPath method does not find the shortest path.

Reproduction steps:

$ amm --predef Search.sc
Loading...
Welcome to the Ammonite Repl 2.2.0 (Scala 2.13.3 Java 11.0.4)
@ shortestPath(
    start = "a",
    dest = "e",
    graph = Map(
      "a" -> Seq("d", "b"),
      "b" -> Seq("c"),
      "c" -> Seq("e"),
      "d" -> Seq("e"),
      "e" -> Seq()
    )
  )
res0: Seq[String] = List("a", "b", "c", "e")

But the expected result would be:

List("a", "d", "e")

I think this unexpected behavior is due to: https://github.com/handsonscala/handsonscala/blob/ebc0367144513fc181281a024f8071a6153be424/examples/6.8%20-%20DepthSearchPaths/Search.sc#L9

Next change should fix it:

- if !seen.contains(next) && !pathLengths.get(next).exists(_ <= pathLength + 1)
+ if !seen.contains(next) || !pathLengths.get(next).exists(_ <= pathLength + 1)

milenkara commented 4 years ago

I am looking at the Merge sort example in the chapter 6, specifically generic version of it (6.9.scala). Since here we are using context bound, how can we use Ordering[T] directly without using implicitly keyword?

Here is a lecture that helped me understand type classes better https://www.youtube.com/watch?v=1e9tcymPl7w. It also explains in which situations implicitly is not needed.

yuchaoran2011 commented 4 years ago

@milenkara Thanks for the link to the presentation. Now I understand it's because of this apply method in the Ordering companion object that allows implicitly to be omitted.

marin123 commented 4 years ago

I have implemented the FloodFill on my laptop, but I needed a higher limit than 25, for the color similarity. What worked for me was 100. Any guesses why? I did it on a mac, scala 2.13.3, java 1.8.0_251.

pradeepravi87 commented 3 years ago

I am trying to test TrieMap.add(), I understand the implementation is to preserve the values in children(collection.mutable.Map[Char, Node]) . But how do I test the values are actually preserved when we call TrieMap.add() for the second iteration?

Tried something like below but it doesn't help with my test, any help here, please

val t = new Trie() t.add("mango") println(t.root.children.get('m')) println(t.root.children.get('g')) println(t.root.hasValue) t.add("mandarin") t.root.children.foreach(x => println (x._1 + "-->" + x._2))

lihaoyi commented 3 years ago

@marin123 haha not sure. This kind of heuristic-based image processing algorithm is always pretty finnicky, but if you can tweak the number to make it look more-or-less right then you're probably fine.

cdcd88 commented 3 years ago

I am working on exercise 6.6 and my implementation is like below

def search[T: Ordering](value: T, list: Seq[T]): Boolean = {
  if (list.length == 1) value == list(0)
  else {
    val midPoint = list.length / 2
    val mVal = list(midPoint)
    if (mVal == value) true
    else if (Ordering[T].lt(mVal, value)) search(value, list.drop(midPoint))
    else search(value, list.take(midPoint))
  }
}

However, the solution uses a nested function (https://github.com/handsonscala/handsonscala/blob/v1/examples/6.6%20-%20BinarySearch/BinarySearch.sc)

def binarySearch[T: Ordering](sorted: IndexedSeq[T], target: T): Boolean = {
  def binarySearch0(start: Int, end: Int): Boolean = {
    // If the sequence you are trying to search has no items, the item cannot be found
    if (start == end) false
    else {
      val middle = (start + end) / 2
      // Otherwise, take the item at the middle of the sequence
      val middleItem = sorted(middle)
      // and compare it to the item you are looking for
      val comparison = Ordering[T].compare(target, middleItem)
      // If the item is what you are looking for, you have found it
      if (comparison == 0) true
      // Otherwise, if the item is greater than the one you are looking for,
      // binary search the left half of the sequence
      else if (comparison < 0) binarySearch0(start, middle)
      // If it is less than the item you are looking for, binary search the right half
      else binarySearch0(middle + 1, end)
    }
  }
  binarySearch0(0, sorted.length)
}

I have seen the nested function in a few examples. I wondered whether it is a scala idiomatic way to use a nested function like the solution

cdcd88 commented 3 years ago

I finished exercise 6.9 and my implementation is like below

def solver(grid: Array[Array[Int]]): Unit = {
  def nextEmpty(grid: Array[Array[Int]]): (Int, Int) = {
    for (row <- Range.inclusive(0, 8)) {
      for (col <- Range.inclusive(0, 8)) { 
        if (grid(row)(col) == 0) return (row, col)
      }
    }
    (-1, -1)
  }

  def solver0(grid: Array[Array[Int]]): Boolean = {          
    val (row, col) = nextEmpty(grid)
    if (row == -1) return true

    for (n <- Range.inclusive(1, 9)) {
      grid(row)(col) = n
      if (isValidSudoku(grid) && solver0(grid)) return true
      else grid(row)(col) = 0 // inside the for loop
    }
    return false
  }
  if (solver0(grid)) pprint(grid)
  else pprint(Array(Array(0)))
}

The implement is similar to the solution on https://github.com/handsonscala/handsonscala/blob/v1/examples/6.9%20-%20Sudoku/Sudoku.sc except I put backtrack inside the for loop

for (n <- Range.inclusive(1, 9)) {
  grid(row)(col) = n
  if (isValidSudoku(grid) && solver0(grid)) return true
  else grid(row)(col) = 0
}

While the solution put it outside the for loop

for (value <- Range.inclusive(1, 9)) {
  cells(i)(j) = value
  if (isValidSudoku(cells) && solve(i + 1, j, cells)) return true
}
cells(i)(j) = 0 // reset on backtrack

Why does the solution put the backtrack outside the for loop?

lihaoyi commented 3 years ago

@cdcd88 both ways work. In general, for any recursive function, you have a choice of putting your "base case"/"backtracking"/"termination" either just before calling the function again ("inside the loop") or just after ("outside the loop").

Which one you choose is generally just a matter of preference, though you should try to be consistent so you don't confuse yourself.

lihaoyi commented 3 years ago

@cdcd88 nesting functions is fine, they're basically functions which are private to other functions, rather than private to classes or packages. Like any other nesting, just try to avoid nesting too deeply. If you end up needing to call the nested function elsewhere, you can just move it out so others can call it

tempaccount172 commented 2 years ago

@SalviCF

Using the patch in https://github.com/handsonscala/handsonscala/issues/6#issuecomment-674555756, I get

List("a", "c", "e", "d")

for this graph

assert(
  pprint.log(
    shortestPath("a", "d",
      graph = Map(
        "a" -> Seq("b", "c"),
        "b" -> Seq("c", "d"),
        "c" -> Seq("e"),
        "e" -> Seq("d"),
        "d" -> Seq()))
  ) == List("a", "b", "d"))

This works on the above graph, yours and the two in the original test:

-      if !seen.contains(next) && !pathLengths.get(next).exists(_ <= pathLength + 1)
+      if !seen.contains(next) || pathLengths(next) > (pathLength + 1)

MikeAWilliams commented 2 years ago

Looking at the DFS solution provided I think it is actually BFS. By using the append it puts the new nodes at the end and they get popped after everything in the queue already. I think changing it to prepend will correct this.

handsonscala / handsonscala

Chapter 6 Notes, Errata, and Discussion #6