Line profiler does not work with async functions

sirex commented 4 years ago

When I run line profile on an async function, it does not record anything.

Here is what I get:

Timer unit: 1e-06 s

Total time: 0 s
Function: prepare_data at line 406

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   406                                           @profile
   407                                           async def prepare_data(
   408                                               context: Context,
   409                                               dstream: AsyncIterator[DataItem],
   410                                               stop_on_error: bool = True,
   411                                           ) -> AsyncIterator[DataItem]:
   412                                               async for data in dstream:
   413                                                   _temp(data, context)
   414                                                   yield data

Total time: 0.046699 s
Function: _temp at line 417

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   417                                           @profile
   418                                           def _temp(data, context):
   419        10        188.0     18.8      0.4      data.payload = commands.rename_metadata(context, data.payload)
   420        10      18029.0   1802.9     38.6      data.given = commands.load(context, data.model, data.payload)
   421        10      12129.0   1212.9     26.0      data.given = commands.prepare(context, data.model, data.given, action=data.action)
   422        10      16353.0   1635.3     35.0      commands.simple_data_check(context, data, data.model, data.model.backend)

Here I moved all the code from prepare_data function to _temp, just to make profiler work.

I'm using line-profiler 3.0.2. On Python 3.8.2.

Theelx commented 3 years ago

@sirex Sorry for responding so late, but this issue is now potentially fixable. Can you reproduce this issue with line profiler 3.2.1, and what OS are you on? I suspect this may be an issue with the C-extension part of line profiler on Windows, if that's what you're using, because I'm able to profile any non-Cython-compiled async function on Python 3.8.8 and Python 3.9.4 (and possibly lower versions, I have yet to try) on Ubuntu 20.04.

sirex commented 3 years ago

I have tested it and now it seems to be working:

import asyncio

@profile
async def test():
    for i in range(2):
        await asyncio.sleep(1)

asyncio.run(test())

$ lsb_release -drc
Description:    Manjaro Linux
Release:        21.0.2
Codename:       Ornara

$ kernprof --version
3.2.1

$ kernprof -lv test.py
Wrote profile results to test.py.lprof
Timer unit: 1e-06 s

Total time: 8.2e-05 s
File: test.py
Function: test at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     3                                           @profile
     4                                           async def test():
     5         3          9.0      3.0     11.0      for i in range(2):
     6         2         73.0     36.5     89.0          await asyncio.sleep(1)

Programmierus commented 2 years ago

It's again not working with 3.4.0, but works with 3.3.0.

I used the same code example from @sirex

% kernprof -V
3.4.0

% kernprof -l -v test.py
Wrote profile results to test.py.lprof
Timer unit: 1e-06 s

Total time: 0 s
File: test.py
Function: test at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     3                                           @profile
     4                                           async def test():
     5                                               for i in range(2):
     6                                                   await asyncio.sleep(1)

VS

% kernprof -V                                        
3.3.0

% kernprof -v -l test.py                             
Wrote profile results to test.py.lprof
Timer unit: 1e-06 s

Total time: 7.7e-05 s
File: test.py
Function: test at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     3                                           @profile
     4                                           async def test():
     5         3          9.0      3.0     11.7      for i in range(2):
     6         2         68.0     34.0     88.3          await asyncio.sleep(1)

Doesn't seem to be python version related (tested on 3.8.0, 3.9.0, 3.10.0). MacOS 12.2.1

Theelx commented 2 years ago

I suspect the issue came with https://github.com/pyutils/line_profiler/commit/4e2744e64afbaf4d882dabd7e9741790b0bc0f68#diff-2c5d18d29e15ee10fd416d3654749cb19a5e2516a5f391551d229cf80d7bac13R29. Specifically, the lines should probably be

import inspect
def is_coroutine(f):
    return inspect.iscoroutinefunction(f)

rather than

def is_coroutine(f):
    return False

I haven't tested it yet though, so that may not be the problem.

Edit: It seems that all coroutine handling was removed in this commit??? Not sure why @Erotemic removed it, I assume because the variable names referenced 3.5, even though the code was actually needed for coroutines in Python >= 3.5

CaptainDP commented 2 years ago

In pyhton3.7 of windows10, the async timing also does not take effect。 environment： windows10，Python 3.7.7 ：

code：

import asyncio

@profile
async def test():
    for i in range(2):
        await asyncio.sleep(1)

asyncio.run(test())

result：

Timer unit: 1e-06 s

Total time: 0 s
File: service.py
Function: test at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           @profile
     5                                           async def test():
     6                                               for i in range(2):
     7                                                   await asyncio.sleep(1)

CaptainDP commented 2 years ago

This method can solve this problem： modify：line_profiler.py

from

def is_coroutine(f):
return False

change to

import inspect
def is_coroutine(f):
return inspect.iscoroutinefunction(f)

add this：

    def wrap_coroutine(self, func):
        """
        Wrap a Python 3.5 coroutine to profile it.
        """

        @functools.wraps(func)
        async def wrapper(*args, **kwds):
            self.enable_by_count()
            try:
                result = await func(*args, **kwds)
            finally:
                self.disable_by_count()
            return result

        return wrapper

Erotemic commented 2 years ago

Not sure how I missed this thread. @CaptainDP thank's for the PR. I'll get that merged.

Erotemic commented 2 years ago

I've released 3.5.1, which contains the patch from @CaptainDP. Can someone verify that this issue is resolved?

falkoschindler commented 2 years ago

As far as I can tell, the patch in 3.5.1 seems to work well. The following code yields some plausible output:

import asyncio
from line_profiler import LineProfiler

profile = LineProfiler()

@profile
async def test():
    for _ in range(100):
        await asyncio.sleep(0.01)

asyncio.run(test())

profile.print_stats()

Timer unit: 1e-06 s

Total time: 0.006042 s
File: /Users/falko/Projects/rosys/./main.py
Function: test at line 44

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    44                                           @profile
    45                                           async def test():
    46       101        343.0      3.4      5.7      for _ in range(100):
    47       100       5699.0     57.0     94.3          await asyncio.sleep(0.01)

fonfonx commented 2 years ago

@falkoschindler Shouldn't Total time be equal to something around 1 second ? I think something is still going wrong with the awaited functions

Erotemic commented 2 years ago

@fonfonx, I've confirmed this is a problem. The following code runs an async and sync variants of @falkoschindler's test:


class Timer:
    def __init__(self):
        import time
        self.counter = time.perf_counter

    def __enter__(self):
        self.start = self.counter()
        return self

    def __exit__(self, a, b, c):
        self.elapsed = self.counter() - self.start

def test_async_profile():
    import asyncio
    import time
    from line_profiler import LineProfiler

    n = 100
    m = 0.01

    async def async_function():
        for idx in range(n):
            await asyncio.sleep(m)

    with Timer() as t:
        asyncio.run(async_function())
    time1 = t.elapsed

    profile = LineProfiler()
    profiled_async_function = profile(async_function)
    with Timer() as t:
        asyncio.run(profiled_async_function())
    time2 = t.elapsed
    profile.print_stats()

    ideal_time = n * m
    max_time = max(time2, time1)
    min_time = min(time2, time1)
    ratio = max_time / min_time
    error = abs(max_time - ideal_time)
    assert ratio < 1.5, 'profiled function should run about as fast'
    assert error < (ideal_time * 0.5), 'should be somewhat close to the ideal time'

    lstats = profile.get_stats()
    unit = lstats.unit
    stats = lstats.timings

    profiled_items = sorted(stats.items())
    assert len(profiled_items) == 1
    for (fn, lineno, name), timings in profiled_items:
        total_time = 0.0
        for lineno, nhits, time_ in timings:
            total_time += time_
    print(f'async ideal_time={ideal_time}')
    print(f'async time1={time1}')
    print(f'async time2={time2}')
    print(f'async unit={unit}')
    print(f'async error={error}')
    print(f'async ratio={ratio}')
    print(f'async total_time={total_time}')

    # --- similar test with sync

    def sync_function():
        for idx in range(n):
            time.sleep(m)

    with Timer() as t:
        sync_function()
    time1 = t.elapsed

    profile = LineProfiler()
    profiled_sync_function = profile(sync_function)
    with Timer() as t:
        profiled_sync_function()
    time2 = t.elapsed
    profile.print_stats()

    ideal_time = n * m

    max_time = max(time2, time1)
    min_time = min(time2, time1)
    ratio = max_time / min_time
    error = abs(max_time - ideal_time)

    assert ratio < 1.5, 'profiled function should run about as fast'
    assert error < (ideal_time * 0.5), 'should be somewhat close to the ideal time'

    lstats = profile.get_stats()
    unit = lstats.unit
    stats = lstats.timings

    profiled_items = sorted(stats.items())
    assert len(profiled_items) == 1
    for (fn, lineno, name), timings in profiled_items:
        total_time = 0.0
        for lineno, nhits, time_ in timings:
            total_time += time_
    print(f'sync ideal_time={ideal_time}')
    print(f'sync time1={time1}')
    print(f'sync time2={time2}')
    print(f'sync unit={unit}')
    print(f'sync error={error}')
    print(f'sync ratio={ratio}')
    print(f'sync total_time={total_time}')

The output is correct of the sync variant, but incorrect for the async variant in terms of total time:

Total time: 0.002221 s
File: /home/joncrall/code/line_profiler/tests/test_async.py
Function: async_function at line 23

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    23                                               async def async_function():
    24       101        116.0      1.1      5.2          for idx in range(n):
    25       100       2105.0     21.1     94.8              await asyncio.sleep(m)

async ideal_time=1.0
async time1=1.0108539490029216
async time2=1.0161754119908437
async unit=1e-06
async error=0.016175411990843713
async ratio=1.0052643242806452
async total_time=2221.0
Timer unit: 1e-06 s

Total time: 1.00602 s
File: /home/joncrall/code/line_profiler/tests/test_async.py
Function: sync_function at line 66

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    66                                               def sync_function():
    67       101        175.0      1.7      0.0          for idx in range(n):
    68       100    1005843.0  10058.4    100.0              time.sleep(m)

sync ideal_time=1.0
sync time1=1.005714118015021
sync time2=1.006154617993161
sync unit=1e-06
sync error=0.006154617993161082
sync ratio=1.0004379972103896
sync total_time=1006018.0

I don't think line_profiler is measuring the await time correctly.

falkoschindler commented 2 years ago

I think the profiler shouldn't include the time spent in an asyncio.sleep call, since the control is handed over to asyncio which might run other tasks in the meantime. The test() function is basically on hold and doesn't consume resources.

Erotemic commented 2 years ago

@falkoschindler, that was my initial thought, but when I modified the async function to time how long the await as taking explicitly:

    async def async_function():
        for idx in range(n):
            with Timer() as t2:
                await asyncio.sleep(m)
            print(f'{t2.elapsed=}')

I got a bunch of this:

t2.elapsed=0.010150016983971
t2.elapsed=0.010166646970901638
t2.elapsed=0.010175412986427546
t2.elapsed=0.010167814965825528

Which indicates to me that the await is actually consuming 0.1s of time, and I think line_profiler should probably measure how long it needs to wait when an await is called.

... but then again maybe not. According to the docs await suspends execution of the coroutine. I don't often use the async functions in Python. For my use cases concurrent.futures is almost always better, so I'm not as familiar with them as I should be.

If there are any asyncio experts out there, discussion on what the correct handling of this case should be would be appreciated.

falkoschindler commented 2 years ago

@Erotemic Sure, the clock keeps running and the elapsed time will be (at least) the requested sleep duration. But as you said: "await suspends execution of the coroutine". Therefore it seems reasonable to me that the execution time doesn't account for time spent in an async call.

Erotemic commented 2 years ago

I can buy that. For cases where other coroutines are taking resources it might not be fair to label that as execution time of the function that invoked the await. @fonfonx if you have a different opinion, I'd be interested to hear it.

fonfonx commented 2 years ago

I am not an expert of asyncio but I would have expected to measure the total time spent in the asyncio.sleep instruction even if await suspends execution of the coroutine because when I use line_profiler to find out which instructions slow down my code I would expect to see that the sleep instruction indeed took time while the current result does not show that. For example when the code makes calls to async services that take time I think the line_profiler should display that and the fact that those services are called in an async or sync manner should not affect the output of the profiler.

For instance using the line_profiler on this little function will mislead me by telling me that the for loop takes most of the time of my code while it is the sleep instruction...

import asyncio
import time
from line_profiler import LineProfiler

profiler = LineProfiler()

@profiler
async def test():
    start_time = time.perf_counter()
    total = 0
    for i in range(1000):
        total += i
    await asyncio.sleep(5)
    final_time = time.perf_counter()
    print(f"Elasped time: {final_time-start_time}")
    return total

if __name__ == "__main__":
    asyncio.run(test())
    profiler.print_stats()

Elasped time: 5.002110341000001
Timer unit: 1e-06 s

Total time: 0.000902 s
File: test.py
Function: test at line 8

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     8                                           @profiler
     9                                           async def test():
    10         1          4.0      4.0      0.4      start_time = time.perf_counter()
    11         1          1.0      1.0      0.1      total = 0
    12      1001        363.0      0.4     40.2      for i in range(1000):
    13      1000        401.0      0.4     44.5          total += i
    14         1         31.0     31.0      3.4      await asyncio.sleep(5)
    15         1          2.0      2.0      0.2      final_time = time.perf_counter()
    16         1         99.0     99.0     11.0      print(f"Elasped time: {final_time-start_time}")
    17         1          1.0      1.0      0.1      return total

falkoschindler commented 2 years ago

@fonfonx But while your thread/process/CPU is busy running the for loop and can't do anything else, it is completely free to execute other tasks when sleeping with asyncio.sleep.

We might think about a slightly different setup. When profiling some function which is awaiting your test(), I'd expect the line profiler to show the amount of time await test() is actually busy running computations like the for loop, but excluding the time spent in asyncio.sleep.

Another point of view: What do you expect when running 10 sleeps of 1 second in parallel? Does that count as 10 seconds execution time? I don't think so.

fonfonx commented 2 years ago

@falkoschindler I agree with you on your point about the parallel execution. However in my tiny example when sleeping with asyncio.sleep the cpu cannot execute lines that are after this instruction, and the await blocks the execution until sleeping is complete. I discovered the line_profiler tool because I was looking for a tool to profile my code line by line, in a sequential manner. I understand that we should not add up the execution times of parallel tasks but for sequential tasks I think it is the more natural behavior. Besides the wording Total time is misleading with the current behavior.

When profiling some function which is awaiting my test() I would actually expect the profiler to tell me that it took 5 seconds to run the function in order to let me investigate why it took so long.

import asyncio
from test import test

from line_profiler import LineProfiler

profiler = LineProfiler()

@profiler
async def main():
    result = await test()
    print(result)
    await asyncio.sleep(5)
    print("Hello World")

if __name__ == "__main__":
    asyncio.run(main())
    profiler.print_stats()

I would expect both the test call and the asyncio.sleep to be profiled in an similar manner since they both took around 5 seconds in my code, but this is not the case.

Total time: 0.000417 s
File: test2.py
Function: main at line 9

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     9                                           @profiler
    10                                           async def main():
    11         1        334.0    334.0     80.1      result = await test()
    12         1          7.0      7.0      1.7      print(result)
    13         1         31.0     31.0      7.4      await asyncio.sleep(5)
    14         1         45.0     45.0     10.8      print("Hello World")

Programmierus commented 2 years ago

I will bring my five cents... Leaving the asyncio.sleep() behind. As per my understanding for any asyncio function only CPU-effective time is counted. So e.g. await aiohttp.session.get(...) also only counts time when it really "works/blocks" (receives, parses data). IO-bound waiting is ignored.

The background of this is clear: asyncio.loop is free and can handle other tasks within the code, but on the other hand it makes it impossible to use line_profiler for finding bottlenecks in such cases. Time calculation for such steps will be misleading for most users...

falkoschindler commented 2 years ago

IO-bound methods are a good point to think about: When a get request takes 3 seconds, but most of the time the task is idle - what should line profiler output? Maybe it depends. Personally I use a profiler to find CPU-intensive code rather than IO bottlenecks. In some projects I have multiple tasks running, each with a loop that does some work and then sleeps for 0.1 s. When profiling I need to know who is consuming too much processing resources so that other tasks are not called in time. Knowing that each of them regularly sleeps for 100 ms does not help. But again: Maybe it depends and could be an optional profiler setting, if both behaviors are equally easy to implement.

I would suggest to close this card, since the original problem of empty profiling reports for async functions is solved. The details about how things should be counted seem to be non-trivial, but go beyond @sirex's original issue.

espdev commented 11 months ago

The results of asynchronous code profiling are absolutely incorrect. For example, I want to measure the runtime of an asynchronous function, but I get results that are useless.

For example:

import asyncio
import time

@profile
async def foo():
    ...

async def main():
    ts = time.monotonic()
    await foo()
    te = time.monotonic() - ts
    print('foo() elapsed time:', te) 

asyncio.run(main())

And I get:

profiler:
Total time: 0.0046907 s

time:
foo() elapsed time: 0.7343...

The profiler's results are meaningless garbage in this case, unfortunately. The line profiler doesn't work with asynchronous code. It's better to show nothing than to show nonsense.

ldorigo commented 1 month ago

Old issue but hit it today. Can't this just be a configuration flag? Don't think it's very useful to debate which is the "right" way to do it. In some cases, you care about overall time it took to execute a function (e.g. when you want a function to execute very fast), so time spent in awaits should be counted towards the total. In others you want to know how much resources a function is using, in which case await's can be ignored (or counted in a smart way).

pyutils / line_profiler

Line profiler does not work with async functions #19