jiffyclub / snakeviz

An in-browser Python profile viewer
https://jiffyclub.github.io/snakeviz/
Other
2.33k stars 137 forks source link

Threading breaks SnakeViz Visualisation (Python 3.12.3) #197

Open NFB418 opened 5 months ago

NFB418 commented 5 months ago

First of all, let me thank you for writing this program. It's been absolutely invaluable for me in speeding up my code by quickly pointing out the bottlenecks in an easy to understand manner.

However, this week, chasing more performance improvements, I added some very minor threading for io-bound tasks (saving .png files to disk using Pillow), and this has broken Snakeviz for me.

I have included two profiles, they represent the exact same code which produced the exact same result, except one used threading and the other did not. The one with threading breaks for me, the one without threading works.

You should be able to see the part that was threaded in the non-threaded visualisation. It is the export_worker method which executes the Image.py:2376(save) method.

My threading code is very simple, below is my attempt at a simplified version of it:

from concurrent.futures import ThreadPoolExecutor
from PIL import Image

class Painter:

    def __init__(self):
        self.thread_pool_executor: ThreadPoolExecutor = ThreadPoolExecutor(1)

    def paint_picture(self):
        # this produces a Pillow Image called img
        self.thread_pool_executor.submit(self.export_worker, img)

    def export_worker(self, img: Image.Image):
        img.save()

I can offer more information on my code if necessary, but I hope this helps. Snakeviz has truly been invaluable to me so I really hope there's a solution to this!!!

profiles.zip

NFB418 commented 4 months ago

It seems this project is not being maintained anymore, which is a real shame.

If there's anyone watching: I have downloaded a fork and managed to ascertain the cause and location of the issue. But I could use the help of anyone who know html/javascript to determine how to fix it.

For some reason, my use of Threading causes an infinite recursion loop to occur in the sv_build_hierarchy function of viz.html (https://github.com/jiffyclub/snakeviz/blob/master/snakeviz/templates/viz.html). I will add the offending code below.

I will keep prying, probably, but regrettably I have NO understanding of html or javascript. This made it quite hard to even locate the bugged function. I don't know if I have the ability to ascertain the specific line, let alone write a fix. The function as written already includes an attempt to prevent this exact recursion issue from recurring, I'm not sure why this works for a non-threaded program but not for a threaded version?

    <!-- Web worker code for generating D3 JSON in a separate thread -->
    <script id="hierarchy-worker" type="javascript/worker">
      // This will all go into a web worker that will be used to generate
      // the visualization JSON while leaving the rest of the app responsive.
      //
      // We put this here instead of in a separate JS file so that the worker
      // can be stopped and restarted without loading the code from the server,
      // and so that the stats data can be embedded in the worker.

      var stats = {% raw callees %};
      function sv_build_hierarchy(
          node_name, depth, max_depth, cutoff, node_time, parent_name, call_stack) {

        // track visited functions so we can avoid infinitely displaying
        // instances of recursion
        if (_.isUndefined(call_stack)) {
          var call_stack = Immutable.Set([node_name])
        } else {
          var call_stack = call_stack.add(node_name);
        };

        var data = {
          name: node_name,
          display_name: stats[node_name]['display_name'],
          time: node_time,
          cumulative: stats[node_name]['stats'][3],
          parent_name: parent_name
        };

        if (depth < max_depth && !_.isEmpty(stats[node_name]['children'])) {
          child_names = {};
          for (var child_name in stats[node_name]['children']) {
            // Cut off children that have already been visited (recursion)
            if (!call_stack.contains(child_name)) {
              child_names[child_name] = stats[node_name]['children'][child_name];
            }
          }

          // Normalize the child times.
          // Unfortunately, stats[child_name]['callers'][node_name][3] doesn't
          // give us the time that child spend under node in this instance, but
          // in all instances across the call tree. Yikes!
          // This may lead to unexpected behavior, e.g., the child times add up
          // to more than the node time. A normalization is necessary.
          var child_times = {};
          var total_children_time = 0.0;
          for (var child_name in child_names) {
            child_times[child_name] = stats[child_name]['callers'][node_name][3];
            total_children_time += child_times[child_name];
          }
          if (total_children_time > node_time) {
            for (var child_name in child_names) {
              child_times[child_name] *= (node_time / total_children_time);
            }
          }

          data['children'] = [];
          // recurse
          for (var child_name in child_names) {
            if (child_times[child_name]/node_time > cutoff) {
              data['children'].push(
                sv_build_hierarchy(
                  child_name, depth+1, max_depth, cutoff,
                  child_times[child_name], node_name, call_stack
                  ));
            }
          }

          // D3, the plotting framework, only accounts for times in the leaf
          // nodes. Hence, if the node_time exceeds the total time in its
          // children, we need to add another child node to account for the the
          // time spent in node itself.
          if (total_children_time < node_time) {
            data['children'].push({
              name: node_name,
              display_name: data['display_name'],
              parent_name: data['parent_name'],
              cumulative: stats[node_name]['stats'][3],
              time: node_time - total_children_time
            });
          }
        }

        return data;
      }

(Apologies for closing and then re-opening the issue. I goofed and pressed the wrong button -.-;)

jiffyclub commented 3 months ago

Hi! Thanks for your patience, I maintain snakeviz in my free time for no compensation so it's not something I can always devote much time/energy to. I am looking into this when I can. I tried to reproduce it with some simple multithreaded coded but no luck, so it's not threading in itself that's breaking the code, it's something in particular with your profile and threading. Your profile is quite large/complex so it's difficult to look into directly. I'll keep giving thought to what could be going wrong and ways to debug it.

NFB418 commented 3 months ago

Hi Matt, thanks so much for your reply! Apologies if I offended by suggesting this project wasn't being maintained. I'm glad that was wrong!

At the moment the shoe is on the other foot and I don't have the time and energy to do any further debugging myself. (I never ended up doing much more since my last post in May, as I concluded that even if I made a minimally reproduceable example, I would not be able to fix the code that was actually causing the problem.) But if there's anything I can do to help you please let me know and I'll try to find the time asap. Snakeviz really has been very valuable for me, so I'd be happy to invest some effort into getting this fixed. :)

The only thing I can say now is to reiterate that these two profiles were the EXACT same code except for threading. They ran on the same seed for random number generation and produced the exact same end state of the simulation. Of course there could always be something weird going on elsewhere, but I really did my best to ensure that my no_threading version was a proper control group with my threading implementation for image export being the only difference from the threaded version.

That means if you open the no_threading profile, it is the methods that start with export_worker and everything below that is different between the two versions. Everything else ought to be the same down to the number of calls (naturally, as the two examples ran independently, actual runtime would have varied a bit).

Snakeviz

I hope that helps a bit!

olejorgenb commented 6 days ago

I'm also seeing weird behavior for threaded code.

import cProfile
import threading
import time
import random

def worker(thread_id):
    print(f"Thread {thread_id} starting")

    def athread_worker_function():
        print(f"Thread {thread_id} working")
        time.sleep(thread_id / 10)

    for i in range(10):
        athread_worker_function()

    print(f"Thread {thread_id} finished")

def main():
    thread1 = threading.Thread(target=worker, args=(3,))
    thread2 = threading.Thread(target=worker, args=(4,))

    thread1.start()
    thread2.start()

    def baz():
        for i in range(10):
            print(f"Main thread working... ({i + 1}/5)")
            time.sleep(0.5)

    def main_work():
        # Do some work in the main thread
        time.sleep(1)
        baz()

    with cProfile.Profile() as p:
        main_work()

    p.dump_stats("/tmp/threadedcprofile-results")

    thread1.join()
    thread2.join()

    print("All threads have finished. Main thread exiting.")

if __name__ == "__main__":
    main()

image