2.18 WebCrawlerMemoized

rikace / fConcBook

Source code for "Concurrency in .NET" book Manning publisher

https://www.manning.com/books/concurrency-in-dotnet

MIT License

147 stars 61 forks source link

2.18 WebCrawlerMemoized #19

Closed tim-huang61 closed 3 years ago

tim-huang61 commented 3 years ago

Hi, I tried WebCrawlerMemoized, it seems not to work, this example runs WebCrawlerMemoized but it still runs WebCrawler if the cache has the key. I think the cause, maybe WebCrawler got data by yield return, but WebCrawlerMemoized got value by return.

rikace commented 3 years ago

Hi @tim-huang61 Thank you for the feedback.

I have tried to run the example 2.18 that uses the WebCrawlerMemoized. I have updated the code memoize to provide some logs.

// Listing 2.12 A simple example that clarifies how memoization works
  public static Func<T, R> Memoize<T, R>(Func<T, R> func) where T : IComparable //#A
  {
      Dictionary<T, R> cache = new Dictionary<T, R>();    //#B
      return arg =>                                       //#C
      {
          if (cache.ContainsKey(arg))
          {
              Console.WriteLine($"Calling item cached {arg} - Time {DateTime.Now}");
              return cache[arg];
          }

          Console.WriteLine($"Calling item no cached {arg} - Time {DateTime.Now}");
          return (cache[arg] = func(arg));                //#F
      };
  }

The output seems that the code works as expected

do you have an example that I can use to replicate your issue?

tim-huang61 commented 3 years ago

Hi @rikace, thank your answer. I have updated memoize by your code and I have updated GetWebContent to print logs, too.

private static string GetWebContent(string url)
 {
            using (var wc = new WebClient())
            {
                Console.WriteLine($"Get Web Content {url} - Time {DateTime.Now}");
                return wc.DownloadString(new Uri(url));
            }
}

the output, it still runs GetWebContent. the expected should not run it.

rikace commented 3 years ago

Hi @tim-huang61 I missed this answer.

The example is targeting the WebCrawler function. Indeed, since the function returns a lazy collection (IEnumerable), the WebContent is going to be re-process every time.

the alternative approach could be update the WebCrawler function as

        public static List<string> WebCrawler(string url) //#A
        {
            IEnumerable<string> WebCrawler(string url)
            {
                string content = GetWebContent(url);
                if (content != null)
                    yield return content;

                foreach (string item in AnalyzeHtmlContent(content))
                    yield return GetWebContent(item);
            }

            return WebCrawler(url).ToList();
        }

rikace commented 3 years ago

if you want, you can create a PR or I can do it

tim-huang61 commented 3 years ago

Hi @rikace, thanks for your answer. I have sent a PR to u. 👍🏻