leepro / yappi

Automatically exported from code.google.com/p/yappi
MIT License
0 stars 0 forks source link

Some function names change to exception class names in Python 2.6 #45

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Create profile with yappi in a multithreaded python app and save in "pstat" 
format
2. Output the results using python pstats module

What is the expected output? What do you see instead?

Instead of some of the function names in both pstats.Stats.print_stats()
and pstats.Stats.print_callers() we see unrelated exception classes, that
according to the code, cannot be related to this code path.

What version of the product are you using? On what operating system?

yappi: 0.8.2
python collect profile: 2.6.6 RHEV-H (RHEL 6.5 based distribution)
python create report: 2.7.5, Fedora 19

Please provide any additional information below.

Tue Apr 29 12:34:25 2014    vdsmd.prof

         4862063 function calls (4887685 primitive calls) in 610.663 seconds

   Ordered by: cumulative time
   List reduced from 2370 to 20 due to restriction <20>

   ncalls        tottime  percall  cumtime    percall filename:lineno(function)
    65197/65412    4.266    0.000    468.594    0.007 mount.py:121(_iterMountRecords)
        437/451    0.040    0.000    460.177    1.020 fileUtils.py:223(pathRequiresFlagForDirectIO)
        407/423    0.084    0.000    459.887    1.087 storage_exception.py:178(VolumeIsBusy)
        407/423    0.000    0.000    459.795    1.087 storage_exception.py:168(VolumeDoesNotExist)
        437/451    0.000    0.000    453.494    1.006 mount.py:157(findMountOfPath)
        389/404    0.045    0.000    446.820    1.106 blockSD.py:594(BlockStorageDomain.getReadDelay)

VolumeIsBusy and VolumeDoesNotExist are exceptions class names. These classes
do not call any related function in this profile from their __init__
or any other method.

Other then this issue, the profile makes sense looking at the code involved.
It looks like the correct function names involved were replaced by the
exceptions by mistake because these exceptions were raised.

Profile data: https://bugzilla.redhat.com/attachment.cgi?id=890843
Application source: git fetch git://gerrit.ovirt.org/vdsm 
refs/changes/58/26158/1 && git checkout FETCH_HEAD
Relevant exceptions classes: 
http://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=vdsm/storage/storage_exceptio
n.py;h=f799d007f155d0f8d9354132526aab667491e321;hb=1a18cf6986953b33edf810a25ab80
1770eea428c

Original issue reported on code.google.com by nir...@gmail.com on 1 May 2014 at 9:56

GoogleCodeExporter commented 9 years ago
Cannot access the attachment in BugZilla. Getting authorization error. Other 
than that, you seem to be right, the function names seems to be replaces by the 
exception names, will try to reproduce the issue with a more simple test case.

Original comment by sum...@gmail.com on 5 May 2014 at 9:48

GoogleCodeExporter commented 9 years ago
More info: profiling newer version of the application on Python 2.7 and 
generating reports on Python 2.7, I cannot reproduce this issue.

Original comment by nir...@gmail.com on 5 May 2014 at 10:03

GoogleCodeExporter commented 9 years ago
The profile data not reachable from bugzilla.

Original comment by nir...@gmail.com on 5 May 2014 at 10:08

Attachments:

GoogleCodeExporter commented 9 years ago
According to newer profiles on Python 2.7, the correct functions that should 
show instead of VolumeIsBusy and VolumeDoesNotExist are 
vdsm/storage/misc.py:209(readfile) and vdsm/storage/misc.py:224(readspeed). 
Both exceptions are not related to these functions.

One hint about this version of vdsm - it creates ton of short living threads. 
So it is possible that threads ids are reused. Looking at yappi code, it seems 
that thread is are assumed to be unique.

Another hint, vdsm uses lot of long living threads - in the profile above, 
there were at least 40 long living threads, 30 of those are running the code in 
the top of the profile. So this may expose incorrect code in yappi that may be 
hidden in more sane python code.

So one test I would try is to profile code that creates lot of short living 
threads, and the other code that let 100-200 threads run the same code path. It 
does not make sense to use such code but it may reveal issue in yappi.

Original comment by nir...@gmail.com on 5 May 2014 at 10:26

GoogleCodeExporter commented 9 years ago
Thanks! Those are great hints. I have tried creating 1000 short living + 40 
living threads in Python2.6, could not reproduce the problem. 

Yappi relies that the OS thread id's to be unique, however in our case that 
seems irrelevant. Our problem simply seems to be a corruption in the internal 
functions hash table where we hold the correct call count/timing information 
but bad function name. I am not sure how can this be achieved, need to inspect 
more.

Parallely, can you provide following:
- Do you get this corruption also when you save the results as YSTAT, not 
PSTAT? Or more simply can you try running yappi.get_func_stats().print_all() 
where you save the stats to see if it causes any difference?
- Is it possible I run vsdm on my local computer? I need RHEL, I believe?
- Can you also try to see if the error changes if you insert some other dummy 
threads to the application? If inserting few dummy threads change the nature of 
the problem, then this means corruption is random. Otherwise, I will be 
suspecting there is a difference on the handling of Exceptions on Python2.6?

Original comment by sum...@gmail.com on 6 May 2014 at 7:56

GoogleCodeExporter commented 9 years ago
I am also attaching the test app I used. You can also freely change/play with 
some values to mimic vsdm. In the end, you know better:)

Original comment by sum...@gmail.com on 6 May 2014 at 7:57

Attachments:

GoogleCodeExporter commented 9 years ago
- Do you get this corruption also when you save the results as YSTAT, not PSTAT?

I cannot try that, since the profile with this issue was taken on a customer 
machine which we don't have access too. I'll try this when I can reproduce this 
issue on my setup.

- Is it possible I run vsdm on my local computer?

Sure, it runs on RHEL, CentOS and Fedora (19, 20). Centos would be the best 
option
since it is similar to the RHEL system we took the corrupted profile, and use
the same python version.

However setting up ovirt and vdsm for simulation of this issue is little 
complicated,
so I would not waste your time on this.

- Can you also try to see if the error changes if you insert some other dummy 
threads?

I will try it when I can reproduce it on my setup.

Original comment by nir...@gmail.com on 6 May 2014 at 9:51

GoogleCodeExporter commented 9 years ago

Original comment by sum...@gmail.com on 13 May 2014 at 7:26

GoogleCodeExporter commented 9 years ago

Original comment by sum...@gmail.com on 13 May 2014 at 7:26